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PREFACE 


This is the first book in the series, “Advances in Digital Business and 
Enabling Technologies”, which aims to contribute to multi-disciplinary 
research on digital business and enabling technologies, such as cloud com- 
puting, social media, big data analytics, mobile technologies, and the 
Internet of Things, in Europe. This first volume focuses on research that 
extends conventional thinking on cloud computing architecture design to 
greater support High Performance Computing (HPC). Meeting the needs 
of HPC users provides significant challenges to cloud service providers, 
both technically and culturally, and this book provides a novel approach 
and indicates a future direction for cloud computing architecture research 
that may address a significant portion of these challenges. Given the sig- 
nificant role that HPC plays in scientific advancement and the increasing 
dominance of cloud computing as a global enterprise computing para- 
digm, this book has value to university educators and researchers, indus- 
try, and policy makers. 

The content of the book is based on contributions from researchers on 
the CloudLightning project, a European Union project funded under 
Horizon 2020 (www.cloudlightning.eu). CloudLightning commenced in 
2015 and brought together eight project partners from five countries 
across Europe to create a new way to provision heterogeneous cloud 
resources to deliver services, specified by the user, using a bespoke service 
description language. The goal of CloudLightning is to address energy 
inefficiencies, particularly in the use of resources, and consequently to 
deliver savings to the cloud service provider and the cloud consumer in 
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terms of reduced power consumption and improved service delivery, with 
hyperscale systems particularly in mind. This book is an output of this 
joint research. 

The chapters in the book are organised around key research contribu- 
tions from CloudLightning. Chapter 1 provides a context for HPC and the 
cloud, and discusses how heterogeneous cloud computing might provide 
a solution for certain classes of HPC users. While heterogeneous resources 
can help address performance concerns of HPC users, it also introduces 
complexity into an already complex feature space. As such, Chapter 1 also 
introduces four key design principles used by CloudLightning to address 
complexity—emergent behaviour, self-organisation, self-management, and 
the separation of concerns. Chapter 2 presents CloudLightning, a novel 
heterogeneous cloud computing architecture. Chapters 3 and 4 outline 
how approaches to resource management, based on self-organisation, self- 
management, and separation of concerns, help to manage the complexity 
of the heterogeneous cloud. HPC users are not the only stakeholders 
whose needs must be met. While HPC users require performance at orders 
of magnitude greater than the norm, modern cloud service providers 
require scalability at so-called hyperscale. Chapter 5 discusses the chal- 
lenges of evaluating the performance of heterogeneous cloud computing 
architectures at hyperscale and presents a simulation of the proposed solu- 
tion. The book concludes with a brief discussion of the disruptive potential 
of the CloudLightning approach both for high performance computing 
and for hyperscale cloud computing in general. 


Dublin, Ireland Theo Lynn 
Cork, Ireland John P. Morrison 
Dublin, Ireland David Kenny 
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CHAPTER 1 


Addressing the Complexity of HPC 
in the Cloud: Emergence, Self-Organisation, 
Self-Management, and the Separation 
of Concerns 


Theo Lynn 


Abstract New use scenarios, workloads, and increased heterogeneity 
combined with rapid growth in adoption are increasing the management 
complexity of cloud computing at all levels. High performance computing 
(HPC) is a particular segment of the IT market that provides significant 
technical challenges for cloud service providers and exemplifies many of 
the challenges facing cloud service providers as they conceptualise the next 
generation of cloud architectures. This chapter introduces cloud comput- 
ing, HPC, and the challenges of supporting HPC in the cloud. It discusses 
how heterogeneous computing and the concepts of self-organisation, self- 
management, and separation of concerns can be used to inform novel 
cloud architecture designs and support HPC in the cloud at hyperscale. 


T. Lynn ( 
Irish Centre for Cloud Computing (1C4), Dublin City University, 
Dublin, Ireland 

e-mail: theo.lynn@dcu.ie 
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Three illustrative application scenarios for HPC in the cloud—(i) oil and 
gas exploration, (ii) ray tracing, and (iii) genomics—are discussed. 


Keywords Cloud computing + High performance computing ° 
Emergent systems e Self-organising systems e Self-managing systems ° 
Heterogeneous computing 


1.1 INTRODUCTION 


The objective of this book is to introduce readers to CloudLightning, an 
architectural innovation in cloud computing based on the concepts of self- 
organisation, self-management, and separation of concerns, showing how 
it can be used to support high performance computing (HPC) in the 
cloud at hyperscale. The remainder of this chapter provides a brief over- 
view of cloud computing and HPC, and the challenges of using the cloud 
for HPC workloads. This book introduces some of the major design con- 
cepts informing the CloudLightning architectural design and discusses 
three challenging HPC applications—(i) oil and gas exploration, (ii) ray 
tracing, and (iii) genomics. 


1.2 CLOUD COMPUTING 


Since the 1960s, computer scientists have envisioned global networks 
delivering computing services as a utility (Garfinkel 1999; Licklider 1963). 
The translation of these overarching concepts materialised in the form of 
the Internet, its precursor ARPANET, and more recently cloud comput- 
ing. The National Institute of Standards and Technology (NIST) defines 
cloud computing as: 


...a model for enabling ubiquitous, convenient, on-demand network access to a 
shared pool of configurable computing resources (e g., networks, servers, storage, 
applications, and services) that can be rapidly provisioned and released with 
minimal management effort or service provider interaction. 

(Mell and Grance 2011, p. 2) 


NIST defines cloud computing as having five essential characteristics, 

three service models, and four deployment models as per Table 1.1. 
Since the turn of the decade, the number and complexity of cloud 

providers offering one or more of the primary cloud service models— 
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Table 1.1 Cloud computing essential characteristics, service models, and deploy- 
ment models (adapted from Mell and Grance 2011) 


On-demand 
self-service 


Broad network 
access 


Resource 


pooling 


Rapid elasticity 


Measured 
service 


Software as a 
Service 
Platform as a 
Service 


Infrastructure 


as a Service 


Private Cloud 


Community 
Cloud 


Essential characteristics 
Consumers can unilaterally provision computing capabilities as needed 
automatically without requiring human interaction with the cloud 
provider. 
Capabilities are available over the network and accessed through 
standard mechanisms that promote use by heterogeneous thin or thick 
client platforms and interfaces (e.g. devices). 
The provider’s computing resources are pooled to serve multiple 
consumers using a multi-tenant model, with different physical and 
virtual resources dynamically assigned and reassigned according to 
consumer demand. 
Capabilities can be elastically provisioned and released, in some cases 
automatically, to scale rapidly outwards and inwards to meet demand. 
To the consumer, the capabilities available for provisioning often appear 
to be unlimited and can be appropriated in any quantity at any time. 
Cloud systems automatically control and optimise resource use by 
leveraging a metering capability at some level of abstraction appropriate 
to the type of service. Resource usage can be monitored, controlled, 
and reported, providing transparency to the service provider and the 
consumer. 
Service models 
The capability provided to a consumer to use a provider’s applications 
running on a cloud infrastructure and accessible by client interface. 
The capability provided to a consumer to deploy onto the cloud 
infrastructure consumer-created or acquired applications created using 
development technologies provided by the provider. 
The capability provided to a consumer to provision computing 
resources to deploy and run arbitrary software such as operating systems 
and applications. 
Deployment models 
The cloud infrastructure is provisioned for exclusive use by a single 
organisation comprising multiple consumers. Ownership, management, 
and operation of the infrastructure may be done by one or more of the 
organisations in the community, by a third party, or a combination of 
both, and it may exist on or off premise. 
The cloud infrastructure is provisioned for exclusive use by a specific 
community of consumers from organisations that have shared concerns. 
Ownership, management, and operation of the infrastructure may be 
done by one or more of the organisations in the community, by a third 
party, or a combination of both, and it may exist on or off premise. 
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Table 1.1 (continued) 


Public Cloud The cloud infrastructure is provisioned for open use by the general 
public. It may be owned, managed, and operated by a business, 
academic, or government organisation, or some combination of them. 
It exists on the premises of the cloud provider. 

Hybrid Cloud The cloud infrastructure is a composition of two or more distinct cloud 
infrastructures (private, community, or public) that remain unique 
entities, but are bound together by standardised or proprietary 
technology that enables data and application portability. 


Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and 
Software-as-a-Service—as private, public, community, and hybrid clouds 
has increased. Cloud computing is now considered to be the dominant 
computing paradigm in enterprise Information Technology (IT) and the 
backbone of many software services used by the general public, including 
search, email, social media, messaging, and storage. Enterprises are 
attracted by the convergence of two major trends in IT—IT efficiencies 
and business agility, enabled by scalability, rapid deployment, and paral- 
lelisation (Kim 2009). Figure 1.1 summarises the strategic motivations 
for cloud adoption. 

Despite its ubiquity, cloud computing is dominated by a small number 
of so-called hyperscale cloud providers, companies whose underlying 
cloud infrastructure and revenues from cloud services are at a different 
order of magnitude to all the others. These include companies who offer 
a wide range of cloud services such as Microsoft, Google, Amazon Web 
Services (AWS), IBM, Huawei and Salesforce.com, as well as companies 
whose core businesses leverage the power of cloud to manage the scale of 
their, typically online, operations such as Facebook, Baidu, Alibaba and 
eBay. Estimates suggest that such companies operate one to three million 
or more servers worldwide (Data Center Knowledge 2017; Clark 2014). 
Research by Cisco (2016) suggests that these hyperscale operators num- 
ber as little as 24 companies operating approximately 259 data centres in 
2016. By 2020, these companies will account for 47% of all installed data 
centre servers and 83% of the public cloud server installed base (86% of 
public cloud workloads) serving billions of users worldwide (Cisco 2016). 

The data centres operated by hyperscale cloud service providers are 
sometimes referred to as Warehouse Scale Computers (WSCs) to differentiate 
them from other data centres. The data centre(s) hosting WSCs aretypically 
not shared. They are operated by one organisation to run a small number 
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of high-use applications or services, and are optimised for those applica- 
tions and services. They are characterised by hardware and system software 
platform homogeneity, a common systems management layer, a greater 
degree of proprietary software use, single organisation control, and a focus 
on cost efficiency (Barroso and Holzle 2007). It is important also to note 
that for these hyperscale clouds, the clouds, per se, sit on top of the physi- 
cal data centre infrastructure and are abstracted from end-user applica- 
tions, end users, and software developers exploiting the cloud. Indeed, 
hyperscale clouds operate across multiple data centres typically organised 
by geographic region. This abstraction, combined with homogeneity, pro- 
vides cloud service providers with cost efficiencies and deployment flexi- 
bility allowing cloud service providers to maintain, enhance, and expand 
the underlying cloud infrastructure without requiring changes to software 
(Crago and Walters 2015). Conventionally, cloud computing infrastruc- 
ture performance is improved through a combination of scale-out and 
natural improvements in microprocessor capability, while service availabil- 
ity is assured through over-provisioning. As a result, hyperscale data cen- 
tres are high-density facilities utilising tens of thousands of servers and 
often measure hundreds of thousands of square feet in size. For example, 
the Microsoft data centre in Des Moines, Iowa, is planned to occupy over 
1.2 million square feet in size when it opens in 2019. While this high- 
density homogeneous scale-out strategy is effective, it results in significant 
energy costs. Servers may be underutilised relative to their peak load capa- 
bility, with frequent idle times resulting in disproportionate energy con- 
sumption (Barroso and Hélzle 2007; Awada et al. 2014). Furthermore, 
the scale of data centre operations results in substantial cooling-related 
costs, with significant cost and energy impacts (Awada et al. 2014). 
Unsurprisingly, given their focus on cost effectiveness, power optimisation 
is a priority for WSC operators. 

From a research perspective, WSCs introduce an additional layer of 
complexity over and above smaller-scale computing platforms due to the 
larger scale of the application domain (including an associated deeper and 
less homogeneous storage hierarchy), higher fault rates, and possibly 
higher performance variability (Barroso and Hólzle 2007). This complex- 
ity is further exacerbated by the dilution of homogeneity through techno- 
logical evolution and an associated evolving set of use cases and workloads. 
More specifically, the emergence of new specialised hardware devices that 
can accelerate the completion of specific tasks and networking infrastruc- 
ture that can support higher throughput and lower latency is enabling 
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support for workloads that traditionally would be considered HPC (Yeo 
and Lee 2011). The introduction of heterogeneity combined with new 
workloads, such as those classified as HPC, will further introduce greater 
system performance variability, including response times and, as a result, 
will impact the quality of service. As such, new approaches to provision- 
ing are required. Despite these challenges, cloud service providers have 
sought to enter the HPC market catering largely for batch processing 
workloads that are perfectly or pleasingly parallelisable. Examples include 
AWS Batch, Microsoft Azure Batch, and Google Zync Render. 
Notwithstanding the entry of these major cloud players, cloud is one of 
the smallest segments in the HPC market and vice versa (Intersect360 
Research 2014) 


13 HiGH PERFORMANCE COMPUTING 


HPC typically refers to computer systems that through a combination of 
processing capability and storage capacity rapidly solve difficult computa- 
tional problems (Ezell and Atkinson 2016). Here, performance is gov- 
erned by the (effective) processing speed of the individual processors and 
the time spent in inter-processor communications (Ray et al. 2004). As 
technology has evolved, processors have become faster, can be accelerated, 
and can be exploited by new techniques. Today, HPC systems use parallel 
processing achieved by deploying grids or clusters of servers and proces- 
sors in a scale-out manner or by designing specialised systems with high 
numbers of cores, large amounts of total memory, and high-throughput 
network connectivity (Amazon Web Services 2015). The top tier of these 
specialised HPC systems are supercomputers whose cost can reach up to 
US$100 million. Such supercomputers are measured in floating-point 
operations per second (FLOPS) rather than millions of instructions per 
second, the measurement of processing capacity in general-purpose com- 
puting. At the time of writing, the world’s fastest supercomputer, the 
Chinese Sunway TaihuLight, has over 10 million cores and a LINPACK 
benchmark rating of 93 petaflops (Feldman 2016; Trader 2017) and a 
peak performance of 125 petaflops (National Supercomputing Centre, 
WuXi n.d.). It is estimated to have cost US$273 million (Dongarra 2016). 

Traditionally, HPC systems are typically of two types—Message passing 
(MP)-based systems and Non-uniform Memory Access (NUMA)-based 
systems. MP-based systems are connected using scalable, high-bandwidth, 
low-latency inter-node communications (interconnect) (Severance and 
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Dowd 2010). Instead of using the interconnect to pass messages, NUMA 
systems are large parallel processing systems that use the interconnect to 
implement a distributed shared memory that can be accessed from any 
processor using a load/store paradigm (Severance and Dowd 2010). In 
addition to HPC systems, HPC applications can be organised into three 
categories—tightly coupled, loosely coupled, and data intensive. The ste- 
reotypical HPC applications run on supercomputers are typically tightly 
coupled and written using the messaging passing interface (MPI) or shared 
memory programming models to support high levels of inter-node com- 
munication and high performance storage (Amazon Web Services 2015). 
Weather and climate simulations or modelling for oil and gas exploration 
are good examples of tightly coupled applications. Loosely coupled appli- 
cations are designed to be fault tolerant and parallelisable across multiple 
nodes without significant dependencies on inter-node communication or 
high performance storage (Amazon Web Services 2015). Three- 
dimensional (3D) image rendering and Monte Carlo simulations for 
financial risk analysis are examples of loosely coupled applications. A third 
category of HPC application is data-intensive applications. These applica- 
tions may seem similar to the loosely coupled category but are dependent 
on fast reliable access to large volumes of well-structured data (Amazon 
Web Services 2015). More complex 3D-animation rendering, genomics, 
and seismic processing are exemplar applications. 

HPC plays an important role in society as it is a cornerstone of scientific 
and technical computing including biological sciences, weather and cli- 
mate modelling, computer-aided engineering, and geosciences. By reduc- 
ing the time to complete the calculations to solve a complex problem and 
by enabling the simulation of complex phenomenon, rather than relying 
on physical models or testbeds, HPC both reduces costs and accelerates 
innovation. Demand and interest in HPC remain high because problems 
of increasing complexity continue to be identified. Society values solving 
these problems, and the economics of simulation and modelling is believed 
to surpass other methods (Intersect360 Research 2014). As such, it is 
recognised as playing a pivotal role in both science discovery and national 
competitiveness (Ezell and Atkinson 2016). International Data 
Corporation (IDC), in a report commissioned for the European 
Commission, highlights the importance of HPC: 


The use of high performance computing (HPC) has contributed significantly 
and increasingly to scientific progress, industrial competitiveness, national and 
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regional security, and the quality of human life. HPC-enabled simulation is 
widely recognized as the third branch of the scientific method, complementing 
traditional theory and experimentation. HPC is important for national and 
regional economies—and for global ICT collaborations in which Europe par- 
ticipates—because HPC, also called supercomputing, has been linked to accel- 
erating innovation. 

(IDC 2015, p. 20) 


Despite the benefits of HPC, widespread use of HPC has been ham- 
pered by the significant upfront investment and indirect operational 
expenditure associated with running and maintaining HPC infrastruc- 
tures. The larger supercomputer installations require an investment of up 
to US$1 billion to operate and maintain. As discussed, performance is the 
overriding concern for HPC users. HPC machines consume a substantial 
amount of energy directly and indirectly to cool the processors. 
Unsurprisingly, heat density and energy efficiency remain a major issue 
and has a direct dependence on processor type. Increasingly, the HPC 
community is focusing beyond mere performance to performance per 
watt. This is particularly evident in the Green500 ranking of supercom- 
puters.' Cursory analysis of the most energy efficient supercomputers sug- 
gests that the use of new technologies such as Graphical Processing Units 
(GPUs) results in significant energy efficiencies (Feldman 2016). Other 
barriers to greater HPC use include recruitment and retention of suitably 
qualified HPC staff. HPC applications often require configuration and 
optimisation to run on specialised infrastructure; thus, staff are required 
not only to maintain the infrastructure but to optimise software for a spe- 
cific domain area or use case. 


14 HPC AND THE CLOUD 


At first glance, one might be forgiven for thinking that HPC and cloud 
infrastructures are of a similar hue. Their infrastructure, particularly at 
Warehouse Scale, is distinct from the general enterprise, and both paral- 
lelisation and scalability are important architectural considerations. There 
are high degrees of homogeneity and tight control. However, the primary 
emphasis is very different in each case. The overriding focus in HPC is 
performance and typically optimising systems for a small number of large 
workloads. Tightly coupled applications, such as those in scientific com- 
puting, require parallelism and fast connections between processors to 
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meet performance requirements (Eijkhout et al. 2016). Performance is 
improved through vertical scaling. Where workloads are data intensive, 
data locality also becomes an issue, and therefore, HPC systems often 
require any given server in its system to be not only available and operative 
but connected via high-speed, high-throughput, and low-latency network 
interconnects. The advantages of virtualisation, and particularly space and 
time multiplexing, are of no particular interest to the HPC user (Mergen 
et al. 2006). Similarly, cost effectiveness is a much lower consideration. 

In contrast, the primary focus in cloud computing is scalability and not 
performance. In general, systems are optimised to cater for multiple ten- 
ants and a large number of small workloads. In cloud computing, servers 
also must be available and operational, but due to virtualisation, the pre- 
cise physical server that executes a request is not important, nor is the 
speed of the connections between processors provided the resource data- 
base remains coherent (Eijkhout et al. 2016). As mentioned earlier, unlike 
HPC, the cloud is designed to scale quickly for perfectly or pleasingly 
parallel problems. Cloud service providers, such as AWS, are increasingly 
referring to these types of workloads as High Throughput Computing 
(HTC) to distinguish them from traditional HPC on supercomputers. 
Tasks within these workloads can be parallelised easily, and as such, mul- 
tiple machines and applications (or copies of applications) can be used to 
support a single task. Scalability is achieved through horizontal scaling— 
the ability to increase the number of machines or virtual machine instances. 
Cost effectiveness is a key consideration in cloud computing. 

So, while there are technical similarities between the hyperscale cloud 
service providers operating their own Warehouse Scale Computing sys- 
tems and HPC end users operating their own supercomputer systems, the 
commercial reality is the needs of HPC end users are not aligned with the 
traditional operating model of cloud service providers, particularly for 
tightly coupled use cases. Why? HPC end users, driven by performance, 
want access to heterogeneous resources including different accelerators, 
machine architectures, and network interconnects that may be unavailable 
from cloud service providers, obscured through virtualisation technolo- 
gies, and/or impeded by multi-locality (Crago et al. 2011). The general 
cloud business model assumes minimal capacity for the end user to 
interfere in the physical infrastructure underlying its cloud and to exploit 
space and time multiplexing through virtualisation to achieve utilisation 
and efficiency gains. The challenge for service providers and HPC end 
users is one of balancing the need for (i) performance and scalability and 
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(ii) maximum performance and minimal interference. CloudLightning 
argues that this can be achieved through architectural innovation and the 
exploitation of heterogeneity, self-organisation, self-management, and 
separation of concerns. 


1.5 HETEROGENEOUS COMPUTING 


As discussed earlier, cloud computing data centres traditionally leverage 
homogeneous hardware and software platforms to support cost-effective 
high-density scale-out strategies. The advantages of this approach include 
uniformity in system development, programming practices, and overall 
system capability, resulting in cost benefits to the cloud service provider. 
In the case of cloud computing, homogeneity typically refers to a single 
type of commodity processor. However, there is a significant cost to this 
strategy in terms of energy efficiency. While transistors continued to 
shrink, it has not been possible to lower the processor core voltage levels 
to similar degrees. As a result, cloud service providers have significant 
energy costs associated not only with over-provisioning but with cooling 
systems. As such, limitations on power density, heat removal, and related 
considerations require a different architecture strategy for improved pro- 
cessor performance than adding identical, general-purpose cores 
(Esmaeilzadeh et al. 2011; Crago and Walters 2015). 

Heterogeneous computing refers to architectures that allow the use of 
processors or cores, of different types, to work efficiently and coopera- 
tively together using shared memory (Shan 2006; Rogers and Fellow 
2013). Unlike traditional cloud infrastructure built on the same processor 
architecture, heterogeneity assumes use of different or dissimilar proces- 
sors or cores that incorporate specialised processing capabilities to handle 
specific tasks (Scogland et al. 2014; Shan 2006). Such processors, due to 
their specialised capabilities, may be more energy efficient for specific tasks 
than general-purpose processors and/or can be put in a state where less 
power is used (or indeed deactivated if possible) when not required, thus, 
maximising both performance and energy efficiency (Scogland et al. 
2014). GPUs, many integrated cores (MICs), and data flow engines 
(DFEs) are examples of co-processor architectures with relatively positive 
computation/power consumption ratios.? These architectures support 
heterogeneous computing because they are typically not standalone 
devices but are rather considered as co-processors to a host processor. As 
mentioned previously, the host processor can complete one instruction 
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stream, while the co-processor can complete a different instruction stream 
or type of stream (Eijkhout et al. 2016). 

Modern GPUs are highly parallel programmable processors with high 
computation power. As can be derived from their name, GPUs were origi- 
nally designed to help render images faster; however, wider adoption was 
hindered by the need for specialised programming knowledge. GPUs have 
a stream processing architecture fundamentally different than the widely 
known Intel general-purpose Central Processing Unit (CPU) program- 
ming models, tools, and techniques. As general-purpose GPU program- 
ming environments matured, GPUs were used for a wider set of specialist 
processing tasks including HPC workloads (Owens et al. 2007; Shi et al. 
2012). Intel’s Many-Integrated Core (MIC) architecture seeks to com- 
bine the compute density and energy efficiency of GPUs for parallel work- 
loads without the need for a specialised programming architecture; MICs 
make use of the same programming models, tools, and techniques as those 
for Intel’s general-purpose CPUs (Elgar 2010). DFEs are fundamentally 
different to GPUs and MICs in that they are designed to efficiently pro- 
cess large volumes of data (Pell and Mencer 2011). A DFE system typically 
contains, but is not restricted to, a field-programmable gate array (FPGA) 
as the computation fabric and provides the logic to connect an FPGA to 
the host, Random Access Memory for bulk storage, interfaces to other 
buses and interconnects, and circuitry to service the device (Pell et al. 
2013). FPGAs are optimised processors for non-floating-point operations 
and provide better performance and energy efficiency for processing large 
volumes of integer, character, binary, and fixed point data (Proaño et al. 
2014). Indeed, DFEs may be very inefficient for processing single values 
(Pell and Mencer 2011). A commonly cited use case for DFEs is high- 
performance data analytics for financial services. In addition to their per- 
formance, GPUs, MICs, and DFEs/FPGAs are attractive to HPC end 
users as they are programmable and therefore can be reconfigured for 
different use cases and applications. For example, as mentioned earlier, 
GPUs are now prevalent in many of the world’s most powerful 
supercomputers. 

It should be noted that while heterogeneity may provide higher com- 
putation/power consumption ratios, there are some significant imple- 
mentation and optimisation challenges given the variance in operation and 
performance characteristics between co-processors (Teodoro et al. 2014). 
Similarly, application operation will depend on data access and the pro- 
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cessing patterns of the co-processors, which may also vary by application 
and co-processor type (Teodoro et al. 2014). For multi-tenant cloud com- 
puting, these challenges add to an already complex feature space where 
processors may not easily support virtualisation or where customers may 
require bare-metal provisioning thereby restricting resource pooling 
(Crago et al. 2011). For data-intensive application, data transmission to 
the cloud remains a significant barrier to adoption. Notwithstanding these 
challenges, cloud service providers have entered the HPC space with spe- 
cialised processor offerings. For example, AWS now offers CPUs, GPUs, 
and DFEs/FPGAs, and has announced support for Intel Xeon Phi proces- 
sors (Chow 2017). 


1.6 ADDRESSING COMPLEXITY IN THE CLOUD 
THROUGH SELF-* DESIGN PRINCIPLES 


This chapter previously discussed two computing paradigms—cloud com- 
puting and HPC—being driven by end-user demand for greater scale and 
performance. To achieve these requirements, heterogeneous resources, 
typically in the form of novel processor architectures, are being integrated 
into both cloud platforms and HPC systems. A side effect, however, is 
greater complexity—particularly in the case of hyperscale cloud services 
where the scale of infrastructure, applications, and number of end users is 
several orders of magnitude higher than general-purpose computing and 
HPC. This complexity in such large-scale systems results in significant 
management, reliability, maintenance, and security challenges (Marinescu 
2017). Emergence and the related concept of self-organisation, self- 
management, and the separation of concerns are design principles that 
have been proposed as potential solutions for managing complexity in 
large-scale distributed information systems (Heylighen and Gershenson 
2003; Schmeck 2005; Herrmann et al. 2005; Branke et al. 2006; 
Serugendo et al. 2011; Papazoglou 2012; Marinescu 2017). 

The complexity of hyperscale cloud systems is such that it is effectively 
infeasible for cloud service providers to foresee and manage manually (let 
alone cost effectively) all possible configurations, component interactions, 
and end-user operations on a detailed level due to high levels of dynamism 
in the system. Self-organisation has its roots in the natural sciences and the 
study of natural systems where it has been long recognised that higher- 
level outputs in dynamic systems can be an emergent effect of lower-level 
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inputs (Lewes 1875). This is echoed in the field of Computer Science and 
through Alan Turing's observation that “global order arises from local 
interactions” (Turing 1952). De Wolf and Holvoet (2004) define emer- 
gence as follows: 


A system exhibits emergence when there are coherent emergent at the macro- 
level that dynamically arise from the interactions between the parts at the 
micro-level. Such emergent are novel with regards to the individual parts of the 
system. 

(De Wolf and Holvoet 2004, p. 3) 


Based on their review of the literature, De Wolf and Holvoet (2004) iden- 
tify eight characteristics of emergent systems: 


1. Micro-macro effect—the properties, behaviour, structure, and pat- 
terns situated at a higher macro-level that arise from the (inter)actions 
at the lower micro-level of the systems (so-called emergents). 

2. Radical novelty—the global (macro-level) behaviour is novel with 
regard to the individual behaviours at the micro-level. 

3. Coherence—there must be a logical and consistent correlation of 
parts to enable emergence to maintain some sense of identity over 
time. 

4. Interacting parts—parts within an emergent system must interact as 
novel behaviour arises from interaction. 

5. Dynamical—emergents arise as the system evolves over time; new 
attractors within the system appear over time and as a result new 
behaviours manifest. 

6. Decentralised control—no central control directs the macro-level 
behaviour; local mechanism influences global behaviour. 

7. Two-way link—there is a bidirectional link between the upper 
(macro-) and lower (micro-) levels. The micro-level parts interact 
and give rise to the emergent structure. Similarly, macro-level prop- 
erties have causal effects on the micro-level. 

8. Robustness and flexibility—no single entity can have a representation 
of the global emergent combined with decentralised control implies 
that no single entity can be a single point of failure. This introduces 
greater robustness, flexibility, and resilience. Failure is likely to be 
gradual rather than sudden in emergent systems. 
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Self-organising systems are similar in nature to emergent systems. 
Ashby (1947) defined a system as being self-organising where it is “at the 
same time (a) strictly determinate in its actions, and (b) yet demonstrates 
a self-induced change of organisation.” Heylighen and Gershenson (2003) 
define organisations as “structure with function” and self-organisation as 
a functional structure that appears and maintains spontaneously. Again, 
based on an extensive review of the literature, De Wolf and Holvoet 
(2004) offer a more precise definition of self-organisation as “a dynamical 
and adaptive process where systems acquire and maintain structure them- 
selves, without external control.” This definition is consistent with 
Heylighen and Gershenson (2003) while at the same time giving greater 
insight. De Wolf and Holvoet (2004) synthesise the essential characteris- 
tics of self-organising systems as: 


l. Increase in order—an increase in order (or statistical complexity), 
through organisation, is required from some form of semi-organised 
or random initial conditions to promote a specific function. 

2. Autonomy—this implies the absence of external control or interfer- 
ence from outside the boundaries of the system. 

3. Adaptability or robustness with respect to changes—a self-organising 
system must be capable of maintaining its organisation autono- 
mously in the presence of changes in its environment. It may gener- 
ate different tasks but maintain the behavioural characteristics of its 
constituent parts. 

4. Dynamical—self-organisation is a process from dynamism towards 
order. 


The concept of self-organisation is often conflated with emergence, par- 
ticularly in Computer Science due to the dynamism and robustness 
inherent in the systems and, frankly, historical similarity of language. While 
both emergent systems and self-organising systems are dynamic over time, 
they differ in how robustness is achieved. They can exist in isolation or in 
combination with each other. For example, Heylighen (1989) and Mamei 
and Zambonelli (2003) see emergent systems arising as a result of a self- 
organising process thus implying self-organisation occurs at the micro- 
level. In contrast, Parunak and Brueckner (2004) consider self-organisation 
as an effect at the macro-level of emergence as a result of increased order. 
Sudeikat et al. (2009) note that the systematic design of self-organising 
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systems is scarcely supported and therefore presents a number of chal- 
lenges to developers including: 


e Architectural design including providing self-organising dynamics as 
software components and application integration 

e° Methodological challenges including conceptual but practical means 
for designing self-organising dynamics by refining coordination 
strategies and supporting validation of explicit models for self- 
organised applications 


Despite these challenges, De Wolf and Holvoet (2004) conclude for 
hugely complex systems “...we need to keep the individuals rather simple 
and let the complex behaviour self-organise as an emergent behaviour 
from the interactions between these simple entities.” 

The concept of self-management is much more well defined in the 
Computer Science literature and has its roots in autonomic computing 
(Zhang et al. 2010). The concept of autonomic computing was popular- 
ised by IBM in a series of articles starting in 2001 with Horn’s “Autonomic 
Computing: IBM’s Perspective on the State of Information Technology.” 
These ideas were further elaborated by Kephart and Chess (2003) and 
Ganek and Corbi (2003) amongst others. For IBM, autonomic comput- 
ing was conceptualised as “computing systems that can manage them- 
selves given high-level objectives from administrators” (Kephart and Chess 
2003). Kephart and Chess (2003) further elaborated the essence of auto- 
nomic computing systems through four aspects of self-management—self- 
configuration, self-optimisation, self-healing, and self-protection. In line 
with autonomic computing, the function of any self-management is the 
use of control or feedback loops, such as Monitor-Analyse-Plan-Execute- 
Knowledge (MAPE-K), that collect details from the system and act 
accordingly, anticipating system requirements and resolving problems 
with minimal human intervention (Table 1.2) (IBM 2005). 

The so-called self-* aspects of IBM’s vision of autonomic computing 
are used in a wide range of related advanced technology initiatives and 
have been extended to include self-awareness, self-monitoring, and self- 
adjustment (Dobson et al. 2010). Despite the significant volume of 
research on self-management, like self-organisation, implementation of 
self-management presents significant challenges. These include issues 
related to the application of the agent-oriented paradigm, designing a 
component-based approach (including composition formalisms) for sup- 
porting self-management, managing relationships between autonomic 
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elements, distribution and decentralisation at the change management 
layer, design and implementation of robust learning and optimisation 
techniques, and robustness in a changing environment (Kramer and 
Magee 2007; Nami et al. 2006). 

Research on the application of the principles of emergence, self- 
organisation, and self-management is widely referenced in Computer 
Science literature, typically discretely. There are few significant studies on 
architectures combining such principles. One such example is that of the 
Organic Computing project funded by the German Research Foundation 
(DFG). This research programme focused on understanding emergent 
global behaviour in “controlled” self-organising systems with an emphasis 
on distributed embedded systems (Miiller-Schloer et al. 2011). However, 
for cloud computing architectures, there are relatively few examples. This 
is not to say that there is a dearth of applications of these concepts for 
specific cloud computing functions. There are numerous examples of bio- 
inspired algorithms for task scheduling (e.g. Li et al. 2011; Pandey et al. 
2010), load balancing (Nishant et al. 2012), and other cloud-related func- 
tions. Similarly, Guttierez and Sim (2010) describe a self-organising agent 
system for service composition in the cloud. However, these are all at the 
sub-system level. The relatively few cloud architectural studies, other than 
those relating to CloudLightning, are all the more surprising given that 
some commentators, notably, Zhang et al. (2010), posit that cloud com- 
puting systems are inherently self-organising. Such a proposition is not to 


Table 1.2 Self-management aspects of autonomic computing (adapted from 
Kephart and Chess 2003) 


Concept Description Benefit 


Self-configuration Automated configuration of components and Increased 


systems follows high-level policies. Rest of responsiveness 
system adjusts automatically and seamlessly. 

Self-optimization Components and systems continually seek Increased operational 
opportunities to improve their own efficiencies 
performance and efficiency. 

Self-healing System automatically detects, diagnoses, and Increased resilience 
repairs localised software and hardware 
problems. 

Self-protection System automatically defends against malicious Increased security 


attacks or cascading failures. It uses early 
warning to anticipate and prevent system-wide 
failures. 
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dismiss self-management in cloud computing outright. Indeed, Zhang 
et al. (2010) admit that cloud computing systems exhibit autonomic fea- 
tures. However, a more purist interpretation suggests that these are not 
self managing and do not explicitly aim to reduce complexity. Marinescu 
et al. (2013) emphasises the suitability of self-organisation as a design 
principle for cloud computing systems proposing an auction-driven self- 
organising cloud delivery model based on the tenets of autonomy of indi- 
vidual components, self-awareness, and intelligent behaviour of individual 
components including heterogeneous resources. Similarly, while self 
management has been applied at a sub-system or node level (e.g. Brandic 
2009), there are few studies on large-scale self-managing cloud architec- 
tures. One such system-level study is Puviani and Frei (2013) who, build- 
ing on Brandic (2009), propose a catalogue of adaptation patterns based 
on requirements, context, and expected behaviour. These patterns are 
classified according to the service components and autonomic managers. 
Control loops following the MAPE-K approach enact adaptation. In their 
approach, each service component is autonomous and autonomic and has 
its own autonomic manager that monitors itself and the environment. The 
service is aware of changes in the environment including new and disap- 
pearing components and adapts on a negotiated basis with other compo- 
nents to meet system objectives. While Puviani and Frei (2013) and 
Marinescu et al. (2013) propose promising approaches, they are largely 
theoretical and their conclusions lack the data from real 
implementations. 

While emergence, self-organisation, and self-management may prove to 
be principles for reducing overall system complexity, for a HPC use case, 
the issue of minimal interference remains. At the same time, surveys of the 
HPC end-user community emphasise the need for “ease of everything” in 
the management of HPC (IDC 2014). To create a service-oriented archi- 
tecture that can cater for heterogeneous resources while at the same time 
shielding deployment and optimisation effort from the end user is not 
insignificant. As discussed, it is counter-intuitive to the conventional 
general-purpose model, which, in effect, is one-size-fits-all for end users. 
Separation of concerns is a concept that implements a “what-how” 
approach cloud architectures separating application lifecycle management 
and resource management. The end user, HPC, or otherwise, focuses its 
effort on what needs to be done, while the cloud service provider concen- 
trates on how it should be done. In this way, the technical details for 
interacting with cloud infrastructure are abstracted away and instead the 
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end user or enterprise application developer provides (or selects) a detailed 
deployment plan including constraints and quality of service parameters 
using a service description language and service delivery model provided 
by the cloud service provider, a process known as blueprinting. Blueprinting 
empowers an “end-user-centric view” by enabling end users to use highly 
configurable service specification templates as building blocks to (re) 
assemble cloud applications quickly while at the same time maintain mini- 
mal interference with the underlying infrastructure (Papazoglou 2012). 
While there are a number of existing application lifecycle frameworks for 
PaaS (e.g. Apache Brooklyn and OpenStack Solum) and resource frame- 
works for laaS (OpenStack Heat) that support blueprints, neither the 
blueprints nor the service delivery models have been designed to accom- 
modate emergence, self organisation, or self-management. 


1.7 APPLICATION SCENARIOS 


It is useful when reading further, to have one or more use cases in mind 
that might benefit from HPC in the cloud and more specifically a novel 
cloud computing architecture to exploit heterogeneity and self-* princi- 
ples. Three motivating use cases are presented: (i) oil and case exploration, 
(ii) ray tracing, and (iii) genomics. These fall into the three HPC applica- 
tion categories discussed earlier, that is, tightly coupled applications, 
loosely coupled applications, and data-intensive applications. In each case, 
an architecture exploiting heterogeneous resources and built on the prin- 
ciples of self-organisation, self-management, and separation of concerns is 
anticipated to offer greater energy efficiency. By exploiting heterogeneous 
computing technologies, the performance/cost and performance /watt 
are anticipated to improve significantly. In addition, heterogeneous 
resources will enable computation to be hosted at hyperscale in the cloud, 
making large-scale compute-intensive applications and by-products acces- 
sible and practical from a cost and time perspective for a wider group of 
stakeholders. In each use case, even relatively small efficiency and accuracy 
gains can result in competitive advantage for industry. 


1.7.1 Oil and Gas Exploration 


The oil and gas industry makes extensive use of HPC to generate images 
of earth’s subsurface from data collected from seismic surveys as well as 
compute-intensive reservoir modelling and simulations. Seismic surveys 


20 T.LYNN 


are performed by sending sound pulses into the earth or ocean, and 
recording the reflection. This process is referred to as a “shot”. To gener- 
ate images in the presence of complex geologies, a computationally inten- 
sive process called Real-Time Migration (RTM) can be used. RTM 
operates on shots, and for each shot, it runs a computationally and data- 
expensive wave propagation calculation and a cross-correlation of the 
resulting data to generate an image. The images from each shot are 
summed to create an overall image. Similarly, the Open Porous Media 
(OPM) framework is used for simulating the flow and transport of fluids 
in porous media and makes use of numerical methods such as Finite 
Elements, Finite Volumes, Finite Differences, amongst others. These pro- 
cesses and simulations typically have not been operated in the cloud 
because of (a) data security, (b) data movement, and (c) poor perfor- 
mance. At the same time, on-site in-house HPC resources are often inad- 
equate due to the “bursty” nature of processes where peak demand often 
exceeds compute resources. RTM and OPM are exemplars of tightly cou- 
pled applications. 

One solution to address challenges and objections related to poor per- 
formance is to use a self-organising, self-managing cloud infrastructure to 
harness larger compute resources efficiently to deliver more energy and 
cost-efficient simulations of complex physics using OPM/Distributed and 
Unified Numeric Environment (DUNE). As well as supporting greater 
cloud adoption for HPC in the oil and gas sector, the development of a 
convenient scalable cloud solution in this space can reduce the risk and 
costs of dry exploratory wells. Relatively small efficiency and accuracy 
gains in simulations in the oil and gas industry can result in disproportion- 
ately large benefits in terms of European employment and Gross Domestic 
Product (GDP). 


1.7.2 Ray Tracing 


Ray tracing is widely used in image processing applications, such as those 
used in digital animation productions where the development of an image 
from a 3D scene is achieved by tracing the trajectories of light rays through 
pixels in a view plane. In recent years, the advancement of HPC and new 
algorithms has enabled the processing of large numbers of computational 
tasks in a much smaller time. Consequently, ray tracing has become a 
potential application for interactive visualisations. Ray tracing is commonly 
referred to as an “embarrassingly parallelisable algorithm” and is naturally 
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implemented in multicore shared memory systems and distributed sys- 
tems. It is an example of a loosely coupled application. 
Ray tracing has applications in a wide variety of industries including: 


e Image rendering for high resolution and 3D images for the anima- 
tion and gaming industry 

e Human blockage modelling in radio wave propagation studies and 
for general indoor radio signal prediction 

e Atmospheric radio wave propagation 

e° Modelling solar concentrator designs to investigate performance and 
efficiency 

e Modelling laser ablation profiles in the treatment of high myopic 
astigmatism to assess the efficacy, safety, and predictability 

e Development of improved ultrasonic array imaging techniques in 
anisotropic materials 

e Ultrasonic imaging commonly used in inspection regimes, for exam- 
ple, weld inspections 

e Modelling Light-emitting diode (LED) illumination systems 


These industries have significant scale, and they increasingly rely on com- 
putationally intensive image processing, accelerated by innovations in con- 
sumer electronics, for example, HDTV and 3D TV. A variety of ray tracing 
libraries exist that are optimised for MIC and GPU platforms, for example, 
Intel Embree and NVIDIA Optix. 


1.7.3 Genomics 


Genomics is the study of all of a person’s genes (the genome), including 
interactions of those genes with each other and with the person’s 
environment. Since the late 1990s, academic and industry analysts have 
identified the potential of genomics to realise significant gains in develop- 
ment time and reduced investment, largely attached to realising efficiency 
gains. Genomics provides pharmaceutical companies with long-term 
upside and competitive advantage through savings right along the Research 
and Development (R&D) value chain (including more efficient target dis- 
covery, lead discovery, and development) but also in better decision-mak- 
ing accuracy resulting from more, better, and earlier information which 
ultimately results in higher drug success rates (Boston Consulting Group 
2001). The net impact is that genomics can result in more successful drug 


22 T.LYNN 


discovery. Relatively small efficiency and accuracy gains in the pharmaceu- 
tical industry can result in disproportionately large benefits in terms of 
employment and GDP. However, genome processing requires substantial 
computational power and storage requiring significant infrastructure and 
specialist IT expertise. While larger organisations can afford such infra- 
structure, it is a significant cost burden for smaller pharmaceutical compa- 
nies, hospitals and health centres, and researchers. Even when such an 
infrastructure is in place, researchers may be stymied by inadequate offsite 
access. 
Genomics has two core activities: 


° Sequencing: a laboratory-based process involving “reading” deoxyri- 
bonucleic acid (DNA) from the cells of an organism and digitising 
the results 

e Computation: the processing, sequence alignment, compression, and 
analysis of the digitised sequence 


Historically, the cost of sequencing has represented the most significant 
percentage of the total. However, this cost has decreased dramatically over 
the past decade due to breakthroughs in research and innovation in that 
area. As the cost of sequencing has dropped, the cost of computation 
(alignment, compression, and analysis) has formed a greater proportion of 
the total. The biggest consumer of compute runtime is sequence align- 
ment—assembling the large number of individual short “reads” which 
come out of the sequencer (typically, a few hundred bases long) into a 
single complete genome. This can be split into many processing jobs, each 
processing batches of reads and aligning against a reference genome, and 
run in parallel. Significant input data is required, but there is little or no 
inter-node communication needed. The most computationally intensive 
kernel in the overall process is local sequence alignment, using algorithms 
such as Smith Waterman, which is very well suited to being optimised 
through the use of heterogeneous compute technologies such as DFEs. 
Genome processing is an exemplar of a data-intensive application. 
Greater energy efficiency is anticipated from using heterogeneous com- 
puting resulting in lower costs. As the cost of the raw sequencing technol- 
ogy drops, the computing challenge becomes the final significant 
technology bottleneck preventing the routine use of genomics data in 
clinical settings. Not only can the use of heterogeneous computing tech- 
nologies offer significantly improved performance/cost and performance / 


ADDRESSING THE COMPLEXITY OF HPC IN THE CLOUD: EMERGENCE... 23 


watt, but enabling this computation to be hosted at large-scale in the 
cloud makes it practical for wide-scale use. In addition to realigning the 
computation cost factors in genome processing with sequencing costs, a 
HPC solution can significantly improve the genome processing through- 
put and speed of genome sequence computation thereby reducing the 
wider cycle time thus increasing the volume and quality of related research. 
The benefits of such a cloud solution for genome processing are obvious. 
Researchers, whether in large pharmaceutical companies, genomics 
research centres, or health centres, can invest their energy and time in 
R&D and not managing and deploying complex on-site infrastructure. 


1.8 CONCLUSION 


This chapter introduces two computing paradigms—cloud computing and 
HPC, both of which are being impacted by technological advances in het- 
erogeneous computing but also hampered by energy inefficiencies and 
increasing complexity. A combination ofself-organisation, self-management, 
and the separation of concerns is proposed as design principles for a new 
hyperscale cloud architecture that can exploit the opportunities presented 
by heterogeneity to deliver more energy-efficient cloud computing and, in 
particular, support HPC in the cloud. 

This book presents CloudLightning, a new way to provision heteroge- 
neous cloud resources to deliver services, specified by the user, using a 
bespoke service description language. As noted, self-organising and self- 
managing systems present significant architecture design, methodological, 
and development challenges. These challenges are exacerbated when com- 
bined and considered at hyperscale. The remainder of this book presents 
CloudLightning’s response to these challenges illustrating the utilisation 
of concepts in emergence, self-organisation, self-management, and the 
separation of concerns in a reference architecture for hyperscale cloud 
computing (Chap. 2). 

Chapter 3 describes the self-organising and self-management formal- 
isms designed to support coordination mechanisms within the 
CloudLightning architecture. As discussed earlier, stakeholders in cloud 
computing, and specifically HPC end users, have different concerns, for 
example, enterprise application developers and end users may want greater 
control over application lifecycle management, and cloud service provid- 
ers want greater control over resource management. To support the sepa- 
ration of concerns and ease of use, a minimal-intrusive service delivery 
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model is presented in Chap. 4. This model uses a CloudLightning-specific 
service description language, blueprinting, and gateway service to enable 
enterprise application developers to specify comprehensive constraints and 
quality of service parameters for services and/or resources and, based on 
the specified constraints and parameters, provide an optimal deployment 
of the resources. 

Finally, Chap. 5 addresses the issue of validation of such a novel archi- 
tecture. As per Sudeikat et al. (2009), the validation of self-organising 
models summatively and formatively presents significant challenges that 
are further complicated at hyperscale. Chapter 5 presents CloudLightning’s 
work on the design and implementation of a Warehouse-Scale cloud simu- 
lator for validating the performance of CloudLightning. 


1.9 CHAPTER ] RELATED CLOUDLIGHTNING READINGS 


1. Lynn, T., Xiong, H., Dong, D., Momani, B., Gravvanis, G. A., 
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A framework for a self-organising and self- managing heterogeneous 
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NOTES 


1. https: //www.top500.org/green500/ 

2. There are other niche processor solutions worth exploring including 
Automata Processors for graph analysis, pattern matching, and data analyt- 
ics; Digital Signal Processor for processing real-world analogue signals; 
Application-Specific Integrated Circuits (ASICs) for use cases such as bit- 
coin mining; and neuromorphic chips for cognitive computing. For more 
discussion, see Zahran (2017). 
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Abstract An overview of the traditional three-layer cloud architecture is 
presented as background for motivating the transition to clouds contain- 
ing heterogeneous resources. Whereas this transition adds many impor- 
tant features to the cloud, including improved service delivery and reduced 
energy consumption, it also results in a number of challenges associated 
with the efficient management of these new and diverse resources. The 
CloudLightning architecture is proposed as a candidate for addressing this 
emerging complexity, and a description of its components and their rela- 
tionships is given. 
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2.1 INTRODUCTION 


Cloud end-users are demanding greater performance and diversity of 
cloud services than ever before. As discussed in Chap. 1, the high- 
performance computing (HPC) and other end-user communities are 
seeking to exploit new and diverse hardware designed for specialist tasks. 
As well as supporting these new demands, cloud service providers (CSPs) 
face the challenges of achieving cost-effective scalability while increasing 
energy efficiency. Accommodating heterogeneity and maximising server 
utilisation (and by inference minimising over-provisioning) is a significant 
shift from conventional homogeneous cloud computing service design. 
This is particularly the case with HPC where end-users require a greater 
level of access and control over elements of the cloud infrastructure. To 
access heterogeneous resources, exploit these resources to reduce applica- 
tion development effort, make optimisation easier, and simplify service 
deployment, a re-evaluation of our approach to both resource manage- 
ment and service delivery is required. 

The remainder of this chapter discusses conventional cloud architecture 
designs and provides an overview of the CloudLightning architecture, a 
novel architecture designed to meet the challenges of the heterogeneous 
cloud. The next section presents the three layers of conventional cloud 
architectures—the Infrastructure Layer, the Cloud Management Layer, 
and the Service Delivery Layer. This is followed by a discussion of the 
main challenges associated with transitioning to a truly heterogeneous 
cloud with an emphasis on resource management and abstraction. In Sect. 
2.4 CloudLightning is presented, a cloud architecture inspired by the 
design principles of emergence, self-organisation, self-management, and 
the separation of concerns discussed in Chap. 1. Each functional compo- 
nent and their relationships are detailed to provide insights into how it 
differs from the conventional cloud and realises important properties from 
the end-user and CSP perspectives including support for heterogeneity, 
ease of use, auto-scaling, data locality, high availability (HA), and net- 
working organisation. 


2.2 CLOUD ARCHITECTURE 


Over the last decade, large-scale consumer-facing cloud services have been 
created by service providers such as Amazon, Microsoft, Google, and 
Rackspace. These data centres are large industrial facilities containing the 
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computing infrastructure that runs their services: servers, storage arrays, 
and networking equipment. This core equipment requires supporting 
infrastructure in the form of power, cooling, and external networking 
links. Reliable service delivery depends on the holistic management of all 
of this infrastructure as a single integrated entity. Architecturally, this 
holistic management can be logically separated into three layers from bot- 
tom to top including an Infrastructure Layer, a Cloud Management Layer, 
and a Service Delivery Layer, as shown in Fig. 2.1. 


2.2.1 Infrastructure Organisation 


Cloud infrastructure design is the art of balancing requirements to ensure 
data centre scalability, maintaining server fault tolerance, minimising costs, 
and maximising bisection end-to-end bandwidth (Kim 201 1; Wang et al. 
2014). Traditional data centre infrastructure is based on a hierarchical 
structure typically with a three-tier design including the Access Layer, the 
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Fig.2.1 Classical cloud architecture is considered to be composed of three layers. 
The Service Delivery Layer is one seen by users; this layer is realised by the Cloud 
Management Layer, which is also responsible for realising the objectives of the Cloud 
Service; the Infrastructure Layer comprises of the underlying collection of storage, 
computing, and network resources and their required hardware and software 
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Fig. 2.2 The traditional three-tier networking infrastructure 


Aggregation Layer, and the Core Layer (Martin Pueblas 2010), as shown 
in Fig. 2.2. 


e The Access Layer (also called the Edge Layer): The primary function 
of the Access Layer is to connect servers that typically reside in the 
same rack. An Access-Layer switch is thus often referred to as a Top- 
of-Rack (ToR) switch. 

° The Aggregation Layer (also called, the Distribution Layer): The 
Aggregation Layer is a multi-purpose system that interfaces the 
Access and Core Layers. The main function ofthe Aggregation Layer 
is to keep the various communication domains separately, thus pro- 
viding intelligent switching and HA between regional ToRs. 

e The Core Layer: The Core Layer is responsible for providing high- 
speed, scalable, and reliable connectivity across the entire data 
centre. 


This traditional three-tier data centre design is created with simplicity 
in mind. The design relies on the use of high-end enterprise-class switches 
in the upper layers, whereas the lower layers can function effectively with 
less sophisticated equipment. Previous research has indicated that adding 
additional servers to a data centre, using the traditional three-tier design, 
will reduce the end-to-end bisection bandwidth in proportion to the size 
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of the data centre (Al-Fares et al. 2008). In support of cloud computing 
and in response to the rise in popularity of Big Data and High-Performance 
Computing as a Service (BDaaS and HPCaaS, respectively), the organisa- 
tion of the infrastructure in modern data centres is biased towards scal- 
ability and high throughput. 

In general, design strategies are centred on two basic models—the 
Switch-Centric model and Server-Centric model. The next section dis- 
cusses these models and the main network designs associated with these 
models. 


2.2.1.1 The Switch-Centric Model 

In the Switch-Centric model, servers are interconnected using switches 
and routers. The Fat-tree network is a representative of the Switch-Centric 
model that is widely acknowledged and used for data centre networking 
infrastructure. A Fat-tree network is also known as Clos topology 
(Leiserson 1985). In a Fat-tree network, servers are grouped into Points 
of Delivery (PoDs). A PoD consists of n number of servers and z number 
of switches. 7/2 switches are connected to z servers and act as Access- 
Layer switches. The remaining switches are connected to the Access-Layer 
switches and, to each other, acting as Aggregation-Layer switches. 
Moreover, PoDs are connected using additional (1/2) switches acting as 
Core-Level interconnections. Thus, the Fat-tree design guarantees a one- 
to-one over-subscription ratio between any pair of nodes in the network. 
However, the scalability of the infrastructure is limited by the number of 
ports available on each switch. BCube (Guo et al. 2009) is another Switch- 
Centric design based on a recursive-defined topology. In a BCube design, 
n servers are connected to an #-port switch forming a cell. z cells are con- 
nected through z switches to form a cube. BCube is designed for modular 
data centres and accommodates high performance in a multicast and 
broadcast network; however, the complexity of network cabling is rela- 
tively high. Portland (Niranjan Mysore et al. 2009), RBridges (Ghanwani 
2011), SmartBridge (Rodeheffer 2000), SEATTLE (Kim 2011), and VL2 
(Greenberg et al. 2011) are commonly used routing and forwarding pro- 
tocols and network address schemes for the Fat-tree-based infrastructure. 


2.2.1.2 The Server-Centric Model 

In the Server-Centric model, both servers and switches participate in 
packet routing, and in the Server-Centric model, both servers and switches 
participate in packet routing and forwarding. DCell (Guo et al. 2008) is a 
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representative implementation of the Server-Centric model. In DCell, » 
servers are connected to an #-port switch forming the smallest entity 
known as a Cell. n+] number of Cells are interconnected via the network 
interfaces of each server, thus forming a larger network. The hierarchical 
topological design makes DCell networks scalable and robust. However, 
the network diameter increases exponentially with the size of the network. 
This implies that Cells in the inner layer will carry more network traffic, 
and end-to-end communications may experience greater latency. FlatNet 
(Lin et al. 2012) is another Server-Centric recursive-defined network. The 
FlatNet design uses more switches to achieve higher scalability, 25, com- 
pared to n? of DCell. Based on similar rules used in DCell, FlatNet orga- 
nises 7 servers in an #-port switch as a Cell. A higher layer is formed from 
n number of lower layers. In FiConn configurations, the main network 
interfaces of a server are connected to their corresponding ToR switch(es), 
and the redundant network interfaces of a server is used to establish direct 
server-to-server connections (Li et al. 2009). In contrast to DCell, FiConn, 
and FlatNet, the SprintNet design focuses on high performance. SprintNet 
uses multiple, c number of switches connecting z servers in each Cell, in 
which 2/(c+1) ports connect to other Cells in the network. Infrastructure 
expansions are achieved by adding c*n/(c+1) Cells each time. The 
SprintNet is specially designed for high-throughput infrastructure. 

The current trend is towards using a Server-Centric design based on a 
recursively defined topology. From a cloud management perspective, the 
number of servers determines scalability, the number of switches affects 
the infrastructure cost and the energy efficiency, the number of links indi- 
cates the complexity of constructing the network, and the diameter of the 
network directly influences the network throughput (high-throughput 
networks will improve the service delivery experience, especially for Big 
Data and HPC and high-throughput computing (HTC) applications). 
HPC and HTC based on heterogeneous computational resources may 
have specific requirements on the types of switches, port numbers, and 
link capacity. Unfortunately, none of the existing design schemes can guar- 
antee scalability, fault tolerance, high performance, and energy efficiency 
at the same time. To this end, a hybrid infrastructure organisation scheme 
using the combination of several interconnected topological designs may 
be required. For example, a combination of Fat-tree, BCube, and 
SprintNet may be capable of providing the required infrastructure. As a 
side effect, a hybrid design introduces further complexity that must be 
managed. 
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2.2.2 The Cloud Management Layer 


Depending on the business goals, the technologies chosen to implement a 
cloud architecture varies from vendor to vendor. In principle, all cloud 
architecture implementations aim to realise quality attributes that most 
appropriately reflect the business goals ofthe CSP. In Chap. 1, cloud com- 
puting was defined, as per National Institute of Standards and Technology, 
as having five properties including on-demand self-service, broad network 
access, resource pooling, rapid elasticity, and measured service (Mell and 
Grance 2011). Technically, any data centre having those properties can be 
considered as a cloud. These properties can be realised by composing a set 
of commonly acknowledged functional components, as shown in Fig. 2.3. 
In principle, all cloud management platforms follow the same architectural 
design, but their implementations vary greatly. The following sections give 
a high-level overview of how two representative cloud management plat- 
forms, namely OpenStack and Google Kubernetes, implement the classical 
cloud architecture, based on virtualisation and containerisation technolo- 
gies, respectively. 


2.2.2.1 OpenStack 

OpenStack (OpenStack, LLC 2017) is an open-source cloud platform 
designed to manage virtualised environments. Hypervisors are used to vir- 
tualise servers; various technologies including Virtual Local Area Networks, 
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Fig. 2.3 Cloud management architect—a component view 


38 D. DONG ET AL. 


Linux kernel namespaces, and various tunnelling techniques are used to 
virtualise networks; and storage resources are abstracted through the use 
of Network File Systems, Remote Volume, Object Storage, and other 
network-based clustering file systems such as GlusterFS (Red Hat & 
GlusterFS 2012), Ceph (Weil 2006), and Google File System (Ghemawat 
et al. 2003). 

In particular, for managing computational resources, OpenStack uses a 
front-end Application Programming Interface (API) server for receiving 
and answering requests. Typically, allocating a computational resource will 
require other components, for example, a virtual network, a security 
group, and operating system images. This can be a complex task when 
dealing with multiple simultaneous requests with different configurations. 
In order to reduce this complexity, the front-end API server forwards the 
requests to a nova-conductor service. The nova-conductor coordinates vari- 
ous associated components to satisfy for a particular request. The nova- 
conductor uses a scheduler service (nova-scheduler) to locate potential 
physical server(s) that meet the specified requirements, including the 
number of Central Processing Unit (CPU) cores, the size of memory, and 
storage space. The requested resources (Virtual Machines [VMs]) will be 
deployed by a nova-compute service (by calling hypervisor-specific APIs) 
on the most appropriate physical servers. Architecturally, the computa- 
tional resource management consists of a front-end API server, request 
coordinators (can be a group of resource coordinators to deal with high- 
volume requests), and an agent per computational node (executing the 
actual resource provisioning and deployment commands). 

Managing networking in the cloud is a complex task. This is because 
conventional network functional components, for example, firewalls, rout- 
ers, switches, networking connections, and Network Interface Cards 
(NICs), must be provided to end-users on top of shared physical network- 
ing resources and networking equipment. These cannot be virtualised or 
containerised like computational resources using hypervisors or container 
engines; rather, networking virtualisation is mainly built on top of several 
packet tagging/encapsulation techniques and the use of software imple- 
mentations of respective networking devices such as virtual routers and 
virtual switches. 

OpenStack storage systems are decoupled from computational 
resources. OpenStack offers several basic types of storage systems includ- 
ing traditional database systems, network-attached storage, and object 
storage. The back-end technologies supporting these storage systems vary 
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greatly. In general, database systems and object storage are used by cloud 
applications, whereas remote volumes are used when creating VMs. 


2.2.2.2 Google Kubernetes 

Kubernetes is the most recent evolution of Google data centre manage- 
ment technology (Rensin 2015; Burns et al. 2016). Architecturally, 
Kubernetes uses a master /worker model. It consists of a master server and 
multiple minions (workers). The command line tools connect to the API 
endpoint in the master, which manages and orchestrates all minions. The 
minions receive instructions from the master and initialise local containers, 
appropriately. 

A Kubernetes Master is composed of a number of components: the API 
server, the Replication Controller, the eted Daemon, and the Scheduler. 
The API server is responsible for processing requests and for manipulating 
the underlying state objects. The Replication Controller determines how 
many pods or containers need to be run. The eted Daemon stores configu- 
ration data. Lastly, the Scheduler is used to place work on an appropriate 
minion (or minions) based on an analysis of the state of the current infra- 
structure and the requirements of the service being provisioned. 

A Kubernetes Minion is also composed of a number of components: 
the Kubelet, the Proxy, the cAdvisor, and a Pod. The Kubelet manages the 
lifecycle of containers in response to instructions from the master. The 
Proxy forwards network traffic to the appropriate containers. It performs 
primitive load balancing and is responsible for making sure that each net- 
working environment is internally accessible while remaining isolated 
from other environments. The cAdvisor is a daemon that provides con- 
tainer users with an understanding of the resource usage and the perfor- 
mance characteristics of their containers. Finally, a Pod defines a collection 
of containers, deployed on the same minion, and provides them with a 
shared context. 


2.2.3 The Service Delivery Layer 


As outlined in Chap. 1, there are three basic cloud service delivery 
models: Software as a Service (SaaS), Platform as a Service (PaaS), and 
Infrastructure as a Service (IaaS). These service delivery models are also 
referred to as cloud business models or resource abstraction models. Each 
of these delivery models is realised in specific layers of the cloud architec- 
ture. IaaS, for example, provides end-users access to tangible physical infra- 
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structures, such as physical servers, networking equipment, and storage 
systems. laaS also provides access to virtualised physical servers, known as 
Virtual Machines. laaS offers maximum flexibility to end-users for config- 
uring and operating the acquired resources, thus laaS targets end-user 
groups interested in building Information Technology (IT) infrastructure. 

In order to reduce the configuration complexity and operational costs, 
CSPs can provide pre-configured platforms and offer those ready-to-use 
platforms to the end-user. This service model is often referred to as 
PaaS. Examples of PaaS are pre-configured operating systems (e.g., Linux, 
Windows), Web application servers (e.g., Apache Tomcat, Oracle Glassfish 
Red Hat JBoss), Workflow Engines (e.g., Apache Orchestration Director 
Engine), and Messaging frameworks (e.g., RabbitMQ, ZeroMQ). PaaS 
provides services to system administrators and developers in need of pre- 
configured platforms for their systems or applications to function as 
expected. Although PaaS can greatly reduce configuration complexity and 
operational costs, it still requires the end-users to have domain-specific 
knowledge to engage with the platforms being provided. There are also 
cloud end-users who are interested only in consuming services, such as 
email, business processes, customised applications, for example, Customer 
Relationship Management and Enterprise Resource Planning. When a 
CSP has installed, configured, and provided those customer-facing soft- 
ware solutions as a service, they are referred to as SaaS. 

As the cloud ecosystem rapidly evolves, heterogeneous resources are 
being incorporated into the cloud environment, which has traditionally 
been homogeneous. This evolution requires multiple service abstraction 
modes to coexist and to be combined to provide more versatile services. 


2.3 TRANSITIONING TO HETEROGENEOUS CLOUDS 


Cloud infrastructure has traditionally been built on homogeneous 
resources. This approach afforded simplicity of design and uniformity of 
resource management. In recent years, different types of resources have 
been made available to the cloud user community and have proven to be 
extremely popular due to their speed and modest power consumption. 
This evolution on the tradition design is thus leading to the emergence of 
the heterogeneous cloud. Heterogeneity is a broad concept. It can refer to 
different models of physical servers, produced by various manufacturers, 
and/or it can refer to different servers having different computational 
power, storage size, and networking capacities. Functionally, various types 
of coprocessors and accelerators, such as the Intel Xeon Phi Coprocessor 
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(Many Integrated Core [MIC]), the Field-Programmable Gate Array 
(FPGA), and the Graphical Processing Unit (GPU), have already been 
used in many production clouds. At a lower level, each type of CPU 
(Advanced Micro Devices, Intel, or even Advanced Reduced Instruction 
Set Computing Machine [ARM |), system memory (e.g., Double Data 
Rate {1, 2, 3}, 3D transistors), and storage types (e.g., mechanical disks 
and Solid State Disks) has different speeds and power consumption pat- 
terns. From a networking perspective, several types of networking connec- 
tions (e.g., 1 Gb/s standard Ethernet, 10/40Gb/s high-speed Ethernet, 
Fibre Optical network, and InfiniBand) coexist in many major cloud 
deployments. The heterogeneity in hardware, resource organisation 
schemes, and software creates rich features and services that can support a 
wide range of applications from general web applications and networking 
infrastructure services to Big Data processing, high-performance/ 
throughput computation applications, and recently the Network Virtual 
Function to support traditional telecommunication applications. 

Heterogeneity also has its challenges from a cloud management per- 
spective due to the complexity associated with managing diversity. Each 
type of hardware, resource organisation scheme, and software has its own 
unique static features, such as architecture, computation power, speed, 
and bandwidth, and each also exhibits different runtime patterns, such as 
power consumption, computation performance, access methods, and sup- 
porting software libraries. In order to efficiently and effectively manage 
such complex environments, the Cloud Management Layer must adapt to 
this evolving diversity. In this regard, the two most challenging aspects 
that must be addressed are the efficient management of resources and the 
support for flexible resource abstraction methods. 


2.3.1 Resource Management 


Heterogeneous resources introduce a large feature space into the cloud. 
The careful refinement of resource features and their combinations pro- 
vide two clear advantages: (i) support for a wide range of applications and 
(ii) an appropriate mapping between application requirements/specifica- 
tions and the resource features/characteristics. These can maximise the 
desires of both the end-user and the CSP, for example, respectively maxi- 
mising application performance and reducing power consumption. This 
process requires resource management capable of efficiently and effec- 
tively manipulating such a large feature space at scale. 
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In the current cloud environment, resource scheduling can be catego- 
rised into three schemes including Monolithic, Two-Level Scheduling, 
and Shared-State (Schwarzkopf et al. 2013). 

A Monolithic Scheduler has a single instance, is sequential, and must 
implement all policy choices in a single code base. The Google Borg 
scheduler is effectively monolithic, although the more recent releases of 
this scheduler have been optimised to provide internal parallelism and 
multi-threading to address HA and scalability. A Two-Level Scheduling 
approach separates application schedulers from resource schedulers. Mesos 
acts in this manner. It is an infrastructure management framework and 
makes use of a central master scheduler to decide how many resources 
from the available pool can be assigned to a framework. An application 
scheduler, within each framework, then allocates resources to applications 
within its own domain. Finally, a Shared-State scheme uses a Shared-State 
Scheduling approach, supporting multiple parallel schedulers. Each sched- 
uler is given a private, local, frequently updated copy of the global state for 
use in making local scheduling decisions. Once a scheduler makes a place- 
ment decision, it updates the shared copy of the global state in an atomic 
commit, and the time from state synchronisation to the commit attempt is 
called a transaction. Google Omega (Schwarzkopf et al. 2013; Burns et al. 
2016) uses the Shared-State scheme. Omega schedulers operate in parallel 
using lock-free optimistic concurrency control. Omega is also designed to 
support multiple distinct workloads having their own application-specific 
interfaces, state machines, and scheduling policies. 

Common cloud resource scheduling algorithms map applications to 
resources using resource availability metrics such as the number of avail- 
able CPU cores, the free memory, the available storage space, and other 
system-state information. These schedulers use as little information as pos- 
sible to make reasonable decisions in a timely manner. This approach is 
sufficient for a cloud composed of homogeneous resources. In contrast, 
heterogeneous clouds introduce a much higher degree of complexity for 
which conventional approaches to resource management are inadequate. 
Thus, new and innovative solutions are required to efficiently support the 
transition from the homogeneous to heterogeneous cloud. 


2.3.2 Resource Abstraction 


Current cloud management platforms are typically designed to manage 
either virtualised or containerised environments. Considering that the 
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traditional cloud consists of homogeneous resources based on general- 
purpose processing units (CPU architectures) and standard hardware 
components, virtualisation and containerisation technologies have dem- 
onstrated their ability, in many production environments, to abstract 
standard hardware resources. 

However, heterogeneity creates new challenges to existing resource 
abstraction methods. Specifically, many computation accelerators, such as 
MICs and GPUs, cannot be simply virtualised nor containerised without 
specific configurations being done at both the hardware and software lev- 
els. In particular, different models and manufacturers of the same type of 
computation accelerators may require different configurations on the host 
server (e.g., setting CPU features in the Basic Input/Output System and 
motherboard configurations) and in the software (e.g., changing kernel 
versions, updating operating system drivers, and choosing the appropriate 
hypervisor). This poses the challenge of how to flexibly use various 
resource abstraction methods to access different types of resources 
seamlessly. 


2.4 THE CLOUDLIGHTNING APPROACH 


The CloudLightning architecture has been constructed in an effort to 
address the challenges resulting from the transition to the emerging het- 
erogeneous cloud. It recognises that the complexities associated with 
resource management due to this transition are nontrivial, and it proposes 
the use of self-organisation and self-management as a potential way for- 
ward. Thus, the architecture is composed of components and services with 
the necessary support for self-organisation and self-management. The 
CloudLightning architecture demonstrates how specialised hardware can 
be seamlessly integrated and the problems of centralised resource manage- 
ment at scale can be addressed, whilst recognising the inevitable added 
complexity resulting from supporting heterogeneity. Figure 2.4 shows the 
overview of the CloudLightning architecture, including the Service 
Delivery Layer, the Cloud Management Layer, and the Infrastructure 
Layer. 


24.1 Infrastructure Organisation 


The infrastructure organisation of CloudLightning is reminiscent of the 
Warehouse Scale Computer concept in which the infrastructure is composed 
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Fig. 2.4 An overview of the CloudLightning architecture showing how its vari- 
ous components are organised into the classical conceptual cloud layers 


of Cells. A Cell is composed of Racks, which in turn contain servers of 
homogeneous hardware. In contrast, CloudLightning also incorporates het- 
erogeneity by allowing different Racks to contain different computational 
resources. 


2.42 Hardware Organisation 


In a CloudLightning managed domain, physical servers are partitioned 
into groups based on geographical locations or regions; each of these 
partitions is called a Cell. A Cell is composed of a pool of heterogeneous 
computational resource, known as the Compute Resource Fabric. In the 
CloudLightning system, five elementary computational hardware types 
are considered explicitly. These include commodity servers (CPUs), serv- 
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ers with GPU accelerators, servers with MIC accelerators, servers with 
FPGA accelerators, and Non-uniform Memory Access Scale high- 
performance computer. 

In a conventional data centre, physical racks are used to hold servers 
and switches. However, in a cloud deployment, the rack has no explicit 
identity that can be used to determine, from within the cloud software 
stack, where a particular compute/storage resource is physically located. 
To maintain information about groups of servers and to manage their 
resources, CloudLightning introduces virtual components called vRacks. 
A vRack contains a group of physical servers that share common proper- 
ties including hardware type, hardware compatibility, and network con- 
nection type. 


2.4.2.1 Resource Abstraction 

The Hardware Abstraction Layer (HAL) provides a logical view of the 
underlying cloud infrastructure directly to the Cloud Management Layer. 
The HAL places resources into vRacks. Each vRack contains a certain 
number of homogeneous resources. The size of each vRack is initially 
determined by the management complexity for the type of resources to be 
managed. During the evolution of the system, a vRack may negotiate with 
other vRacks to exchange information and to transfer resources to achieve 
system goals such as maximising resource utilisation, reducing power con- 
sumption, and improving the service delivery experience. 

When new hardware joins the CloudLightning managed domain, a 
dedicated Plug & Play interface is used to facilitate the connection of new 
hardware to the CloudLightning system. The newly connected hardware 
is required to expose available capacities and capabilities to the interface. 
In response, the interface will create CloudLightning-specific resources 
(CL-Resources) to represent the capabilities exposed. Depending on their 
type, these CL-Resources will be attached to an existing vRack, or if an 
appropriate vRack of this type is not available, a new vRack of an appropri- 
ate type is created. Where appropriate, the newly created vRack will be 
managed by a designated vRack Manager. This process is shown in Fig. 2.5. 


2.43 The Cloud Management Layer 


The CloudLightning management layer is shown in Fig. 2.4. The func- 
tional components and their relationships are explained in detail in the 
following sections. 
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Fig. 2.5 Support for heterogeneous resources using Plug Play interface at the 
Hardware Abstraction Layer 


A Cell Manager is the software component associated with each Cell. 
The Cell Manager receives an Application Requirements Document from 
the Gateway Service, and it acquires CL-Resources in response to the 
requirements articulated in that “document”. This can be done in at least 
one of two ways: either by allowing the user to select from a set of resources 
returned from a Resource Discovery phase or by allowing the system to 
assign appropriate resources immediately that meet the service require- 
ments. In the former case, resource reservation is required while users 
make their choice, and in the latter case no reservation is needed. 


2.4.3.1 CL-Resource Discovery 

The CL-Resource Discovery process is initiated when the Cell Manager 
receives an Application Requirements Document from the Gateway. This 
“document” contains a set of Blueprint Requirements and a set of Service 
Requirements for each service in that Blueprint. 

The function of the discovery process is to locate all of the possible 
CL-Resources that can be used to implement each of these services, con- 
sistent with particular constraints articulated in the list of Service 
Requirements. 

The discovery process can determine information about dynamically 
changing capabilities and capacities by communicating with a group of 
vRack managers. Erom this information, the discovery process determines 
the CloudLightning system’s ability to provide CL-Resources for each of 
the possible Implementation Optionsmentioned in the Service Requirements. 


CLOUD ARCHITECTURES AND MANAGEMENT APPROACHES 47 


To guarantee these options remain available until the selection process is 
complete, all of the associated CL-Resources must be reserved by the asso- 
ciated vRack Managers. Thus, resources are potentially reserved across 
multiple vRack Managers until the selection process determines that they 
should be acquired or released. All of these Implementation Options are 
then passed directly to the CL-Resource selection process. 


2.4.3.2 The CL-Resource Selection 

This process applies the remaining constraints articulated in the list of 
Service Requirements and constraints associated with the Blueprint 
Requirements to determine a solution set consistent with all of the 
Application Requirements. If at this stage the solution set is not unique, 
the selection process will choose a unique solution by considering the 
options that minimise the overhead for the CSP. The associated 
CL-Resources in the solution set are then acquired automatically and 
those CL-Resources not in the solution set are released. Once the 
CL-Resources are acquired, their handlers are passed back to the Gateway 
for subsequent use by the Deployment Manager. 

A vRack Manager is associated with each vRack. The function of a 
vRack Manager is to manage all of the CL-Resources that can be exposed 
from its associated vRack. In addition, it can create/aggregate 
CL-Resources in/on its vRack, as necessary. When the vRack Manager 
aggregates CL-Resources in its vRack, it creates a new type of CL-Resource 
called a Coalition. This is one of the defining characteristics of the 
CloudLightning system in that it allows CL-Resources to be formed into 
groups of homogeneous CL-Resource types to implement specific services 
with those requirements. A vRack Manager is responsible for managing 
the physical servers in its vRack. The set of servers associated with vRacks 
may be re-allocated over time. Similarly, new servers may be added to a 
Cell and others may be removed. This may trigger the creation /destruc- 
tion/reorganisation of vRacks and their associated vRack Managers. 

There are three functional components within each vRack Manager: a 
Resource Acquisition component, a Coalition Lifecycle Management 
component, and a Self-Organisation Agent. 


2.4.3.3 Resource Acquisition 

This component is activated by the selection process of the Cell Manager. 
It attempts to acquire CL-Resources; this can be guaranteed if they have 
been previously reserved. The CL-Resources being acquired may already 
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exist within the vRack or they may have to be dynamically created by the 
vRack Manager. Once these CL-Resources have been acquired, their 
CL-Resource handlers are returned to the selection process of the Cell 
Manager. 


2.4.3.4 Coalition Lifecycle Management 

A Coalition is a special type of CL-Resource. In general, it represents a 
group of homogeneous CL-Resources, each of which exists within a single 
vRack. The vRack Manager may form a number of Coalitions, which may 
be persistent and used as a means of rapidly providing an implementation 
option for specific services. These persistent Coalitions are called Static 
Coalitions. The vRack Manager may also aggregate its CL-Resources, 
none of which may be a Coalition in itself, to form Coalitions dynamically 
in response to a specific CL-Resource acquisition request from Cell 
Manager. In managing dynamic CL-Resources, such as Coalitions, bin- 
packing strategies can be used to balance resource utilisation and power 
management. By appropriately managing the mix of static versus dynamic 
CL-Resources, faster service deployment can be balanced against potential 
savings on power consumption. 

A Coalition is an entity that can be seen as an execution environment, 
formed by grouping together a number of CL-Resources. Coalitions may 
exist inside a single vRack and so each is under the control of single vRack 
Manager. The constituency ofa Coalition may span multiple servers within 
that vRack. Coalitions are formed by a vRack Manager in response to spe- 
cific service requirements. The vRack Manager may decide to persist 
Coalitions for improved service delivery, and these Coalitions are called 
Static Coalitions. Coalitions may also be formed dynamically by a vRack 
Manager again in response to specific service requirements. This dynamic 
formation may involve the dynamic creation of some or all of the constitu- 
ent CL-Resources. When a dynamically formed Coalition is subsequently 
disbanded, its dynamically created constituents are destroyed, but any 
static CL-Resources used in its formation are left unchanged and persist to 
be reused in subsequent Coalition formations. Figure 2.6 illustrates a 
number of Coalitions in a vRack. From the illustration, it can be seen that 
a Coalition can exist entirely within a single server or can span multiple 
servers within the same vRack. In the situation that a single vRack Manager 
does not contain sufficient resources to satisfy a specific requirement, it 
may negotiate with an adjacent vRack Manager to acquire the appropriate 
resources. 
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Fig. 2.6 Illustration of resource coalition 


2.43.5 Self-Organisation Agent 

The vRack Manager is a basic component of self- organisation in the 
CloudLightning system. vRack Managers organise themselves into groups 
and collectively determine local optimum strategies for CL-Resource 
management. The Self-Organisation Agent maintains information about 
other vRack Managers in the same group, it exchanges local state informa- 
tion with the Self-Organisation Agent in those vRack Managers, and it 
triggers power management decisions in the servers contained in its vRack. 
Negotiations between the various Self-Organising Agents within a vRack 
Manager group may result in the migration of servers from one vRack to 
another. Since CL-Resources may span multiple servers in the same vRack, 
any proposed migration must not violate the invariants associated with 
maintaining coalitions. 
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A vRack Manager Group is composed of a group of vRack Managers 
whose vRacks contain the same type of hardware. The Self-Organisation 
Agents of the vRack Managers within the group exchange information to 
optimally respond to resource discovery request from the Cell Manager. 
Together, they decide on if, and on where, the required CL-Resources are 
located or could be created. In making these decisions, the individual 
interests of each vRack Manager and the interests of the group as a whole 
are taken into account. This distributed decision process embodies the 
self organisation strategy, which evolves to meet global objectives deter- 
mined from the requirements driving the architecture design. vRack 
Managers are distinguished by the vRack hardware type. This distinction 
gives rise to a classification of the vRack Managers. 


2.4.3.6 Classification of vRack Managers 

Type-A vRack Managers are the most generic. They manage a collection 
of hardware resources of the same type (see Fig. 2.7). In one instance, 
these can be commodity hardware; in another, they could be CPU-GPU 
pairs, CPU-Data Flow Engine (DFE) pairs, or CPU-MIC pairs. 

Type-B vRack Managers are more specialised. They manage a collec- 
tion of HPC machines of the same type, each of which is exposed to the 
CloudLightning system as a single CL-Resource (see Fig. 2.8). If two or 
more HPC machines are managed by the same vRack Manager, then it is 
assumed that they are identical in all respects. This ensures that the 
CL-Resources exposed to the vRack Manager are the same. 

Type-C vRack Managers manage a collection of hardware resources of 
the same type co-located on a high-speed interconnect (see Fig. 2.9). 
These can be commodity servers, or in other instances, they could be serv- 
ers with GPU accelerators, servers with MIC accelerators, or servers with 
DFE accelerators. 


Fig. 2.7 vRack vRack Manager 
Manager Type-A (Type-A) 
Server Server Server Server 
Server Server Server 


CLOUD ARCHITECTURES AND MANAGEMENT APPROACHES 5l 
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2.4.3.7 vRack Manager Activities 
Type-A vRack Managers can only group with other Type-A managers (see 
Fig. 2.10). These groups can self-organise (e.g., in an attempt to improve 
power consumption). Self-organising involves servers migrating between 
vRack Managers in the same group. These groups also self-manage to 
improve service delivery but deciding locally which member of the group 
is the best to respond to particular service requests. 

Neither Type-B nor Type-C vRack Managers engage in self-organisation. 
In general, the CL-Resources being managed are created from hardware 
of different types, thus cannot migrate to other vRack Managers. However, 
in principle, Type-B (see Fig. 2.11) vRack Managers can group together 
and Type-C (see Fig. 2.12) vRack Managers can group together in an 
effort to reduce the overall number of vRack Manager Groups. This in 
turn will simplify the administration required in the Cell Manager. 


2.44 Service Delivery Model 


The ready availability of large numbers of powerful, and increasingly het- 
erogeneous, resources being made available by CSPs is making possible the 
deployment of large, data, and compute-intensive, applications. In many 
cases, these, quite often legacy, applications are monolithic in construction 
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and require bespoke execution environments. Consequently, it can be chal- 
lenging to deploy them in the cloud without acquiring IaaS and employing 
specialised engineering knowledge. 

In this cloud usage model, the provider has no control over the effec- 
tive utilisation of resources nor have cloud application developers an 
incentive to engage in costly customisation to increase resource efficiency 
when, regardless of the efficiency achieved, they are paying for the entire 
resource. Composing cloud services from workflows of large, possibly 
legacy, applications will most likely be the trend as support for emerging 
Big Data applications requires sophisticated, multi-phase data processing. 
Being essentially independent, the required resources for the applications 
that run in each of these phases may differ greatly in number and type, and 
hence the problems of cost and efficiency could be significantly exacer- 
bated. Clearly, an approach is needed to allow the sophistication of the 
cloud to evolve in an efficient and cost-effective manner. It can be seen 
that there is no clear distinction between the concerns of cloud application 
developer and those of the Cloud Provider. The concerns of the CSP cen- 
tre around efficient management and utilisation of cloud resources, and 
the concerns of cloud application developers centre on the specification, 
deployment, and service-level agreements (SLAs) associated with their 
applications. 

To address this usability question, CloudLightning uses a Blueprint- 
oriented cloud application design and deployment approach. In this con- 
text, Blueprints are workflows in which nodes (Service Element) represent 
extant applications and edges distinguish the phases of the Blueprint exe- 
cution where particular applications are active. All Service Elements are 
stored in a Service Catalogue, which is managed by the Gateway Service 
(Fig. 2.4). Cloud application developers may choose Service Elements 
from the Service Catalogue and link Service Elements to realise desired 
business logics. Attributes and parameters can be specified on a per Service 
Element basis. Altogether, the Service Elements, their linkages, and associ- 
ated attributes and parameters comprise the application Blueprint, as 
shown in Fig. 2.13. The use of the Blueprint would drastically alter the 
current cloud usage model in that it would shift the burden of resource 
discovery, provisioning, and deployment from the cloud application devel- 
opers to CSPs. This shift would greatly reduce the cost to, and the level of 
expertise needed by, cloud application developer while simultaneously giv- 
ing CSPs full control over, and affording opportunities for the efficient use 
of, the cloud resources. 
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Fig. 2.13 CloudLightning Blueprint 


2.4.5 Advanced Architecture Support 


The design philosophy of the CloudLightning architecture is fundamen- 
tally different from the current cloud in operation. This results in the 
CloudLightning having different strategies for realising various important 
properties including auto-scaling, data locality, HA, and networking 
organisation. 


2.4.5.1 Auto-Scaling 

Scalability is one of the most important features in cloud computing. The 
CloudLightning system supports scalability provided that Blueprint devel- 
opers explicitly indicate in the Blueprint which services are expected to 
require scaling. This explicit indication can be given by enclosing the ser- 
vices to be scaled within a Scaling Envelope. This envelope embeds services 
into Blueprint in order to monitor its load. When a pre-defined load 
threshold is crossed, this system service will dynamically acquire the appro- 
priate resources from the CloudLightning system to scale the user service 
appropriately. By using the envelope in the Blueprint, consumers can see 
that execution of that Blueprint may result in charges relating to extra 
resources that cannot be determined statically. Additionally, the 
CloudLightning auto-scaling scheme allows application developers to 
explicitly specify how to service elasticity and partition data in a fine- 
grained manner. The scaling envelope and its associated impact on the 
CloudLightning system are illustrated in Fig. 2.14. 


CLOUD ARCHITECTURES AND MANAGEMENT APPROACHES 57 


Gateway Service 
$, A | Blueprint 
1 


1 
' 
' 

Enterprise [ Resource 
Application ' Discovery 
Developer Y Process 

Blueprint (s2) 


Resource 
Selection 
Process 


Blueprint 
Deployment 
Process 


Coalition Formation of Service 2 


=m OX OO | 


Coalition Pool 


Resource Abstraction 


Layer 


Fig. 2.14 Auto-scaling using CL Envelope Mechanism 


2.4.5.2 High Availability 

HA refers to the mechanisms used to ensure continuity of service delivery. 
If an infrastructure component (e.g., network equipment or server) fails, 
redundancy and flexible load balancing mechanisms may be employed to 
ensure that the overall service remains available. HA will be addressed 
within the CloudLightning system by using a Hot-Standby server for each 
of its software components. To provide HA of the services running on the 
CloudLightning system, service replication may be used. Since replication 
has an associated cost, the decision to use it should be made by the 
Blueprint developers by expressing that preference in the Blueprint. An 
envelope mechanism similar to the one used for auto-scaling may be used. 


2.4.5.3 Data Locality 
Data locality, defined as keeping data close to the computation, is one 
of the most important factors considered for HPC/HTC and Big Data 
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applications. In the cloud environment, the concept of data locality is 
not well defined. The CloudLightning system does not propose to 
introduce mechanisms to give Blueprint developers control over the 
data locality, unless that control is provided explicitly by specialised 
CL-Resources dedicated to high-speed data processing. Thus, this func- 
tionality would have to be exposed to the Blueprint developers at the 
Blueprint level. 

In the CloudLightning system, data locality constraints may have to be 
considered at various levels in the self-managed and self-organised compo- 
nents; thus, it may be necessary to develop strategies for data locality at the 
Coalition, vRack, and Cell level. For instance, if a given Blueprint consists 
of two services: Service_A and Service_B, knowing that if Service_A will 
generate significant amount of data that will be further processed by 
Service_B (this information will be specified between Service_A/B in the 
Blueprint specification), then this information is a potential data locality 
requirement for the Blueprint which will be thereafter used by Cloud 
Management Layer to deploy the Blueprint on appropriate resources. On 
the other hand, in different application domains, such as HPC/HTC and 
Big Data, many applications require local storage for computation. In 
cases where data locality is a predominant concern, CloudLightning sys- 
tem is designed to use Network Attached Storages (NAS) through high 
bandwidth links in order to minimise the data transmission cost over the 
network. However, in cases where the NAS is not present, local persistent 
storage can also be used. 


2.4.5.4 Dynamic VPN Creation for Blueprint Service Execution 

To create an isolated execution environment for each Blueprint, the 
CloudLightning Management Layer creates dedicated Virtual Private 
Networks (VPNs) for each Blueprint, as shown in Fig. 2.15. The services 
within a Blueprint need to communicate with each other, services are 
mapped onto dedicated Coalitions, which may be running on different 
physical servers. In addition, the Coalitions running various services of a 
Blueprint may extend over multiple vRacks. Regardless of their physical 
location in the CloudLightning system, dedicated VPNs created for each 
Blueprint will ensure that CL-Resources and the data exchange between 
them remain secure and private to the Blueprint from which they are 
constructed. 
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2.5 (CONCLUSION 


The trend for hardware vendors to create more specialised offerings, capable 
of providing faster, more accurate, and power-efficient solutions, looks set to 
continue. The increasing demand for this hardware and for access to HPC is 
driving an evolution of cloud computing that offers versatile services. A het- 
erogeneous cloud at scale embodies many hardware types, each with differ- 
ent cost/performance/power profiles. This, together with the attempt to 
satisfy the disparate needs of a large and varied customer community, makes 
the heterogeneous cloud a complex system. In evolving to heterogeneous 
clouds, CSPs may no longer offer Software/Platform/Infrastructure as a 
service, separately. Instead, CSPs may undertake to offer a combination of 
these to the customer on demand. This approach would require a service 
orchestration designer tool that could be used to compose a set of services 
together with an appropriate expression of service-level requirements into a 
cloud application Blueprint. From this perspective, customers no longer 
need to be concerned about how solutions are provided; rather customers 
can concentrate on describing the problem to be solved. This also gives more 
control to the CSP over how to provision and optimise resources, to meet 
both consumer needs and system requirements. However, the complexity of 


60 D.DONGETAL. 


managing resources in a heterogeneous cloud environment should not be 
underestimated. Self-organisation is one of the tools that can be employed to 
effectively address this complexity. More specifically, in a properly designed 
self-organising approach, global system objectives may be met as the by- 
product of emergent behaviour resulting from the application of low-level 
self-organising rules and strategies; this approach has been adopted by the 
CloudLightning project. In the next chapter, the self-organising and self- 
managing approach for cloud management in the CloudLightning architec- 
ture level and details for developing effective cloud organisation strategies 
and efficient resource management algorithms are addressed. 
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3.1 INTRODUCTION 


A general framework for self-organisation and self-management (SOSM) 
is needed to support hierarchical architectures composed of autonomous 
components such as those described in the CloudLightning (CL) archi- 
tecture discussed in Chap. 2. This chapter introduces a novel framework 
for SOSM developed to support CloudLightning. The next section pres- 
ents key concepts in SOSM and how they are used to augment the 
CloudLightning architecture. The various SOSM mechanisms that enable 
components within CloudLightning to communicate, modify behaviour, 
make decisions, and cooperate with each other are then presented. 
Components may use different strategies for SOSM. As such, exemplar 
strategies are presented and illustrated in the context of CloudLightning 
through example scenarios. 


3.2 KEY CONCEPTS 


As discussed in Chap. 2 and mentioned above, the CloudLightning archi- 
tecture is composed of autonomous components. Each component is 
equipped with various Strategies. These can be self managing and/or self- 
organising strategies, and define how components at various levels in the 
hierarchy should evolve towards some ideal state known as the compo- 
nent’s local goal. 

In general, decisions being made by components at a particular level in 
the hierarchy can directly influence evolution in the adjacent levels. These 
influences may come from the top down, or from the bottom up. When 
coming from an upper level in the hierarchy, the process is called Directed 
Evolution. Directed Evolution signals the desire of the upper level to have 
the components, in the level underneath, change in operation or in con- 
figuration, to align with the goal of the upper level. Since components at a 
particular level also have local goals, the overall evolution that is brought 
about at that level should respect progress towards those local goals, while 
simultaneously accommodating the Impetus associated with the Directed 
Evolution process. An Impetus is communicated in the form of a tuple of 
values (i.e., a vector), known as a Weight. In a similar manner, a lower level 
in the hierarchy may directly influence the level above. This can be seen as 
Feedback from the lower level. This Feedback, in the form of tuples of 
values (i.e., vectors), known as Metrics, is derived from the operations of 
the components at the lower level and gives the upper level a Perception of 
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how the lower layer is changing and evolving. Perceptions can be used to 
determine subsequent Directed Evolution decisions. 

As part of the self-organisation process, the interaction of two or more 
components, in any level of the hierarchy, may result in component cre- 
ation, component destruction, component splitting, and/or component 
merging. 

A measure of how close a component is to stasis, and hence how suitable 
its Operating characteristics are for contributing to the global goal, is referred 
to as its Suitability Index (SI). In principle, any component subject to Impetus 
and possessing a Perception has an associated SI. Thus, in the CloudLightning 
framework, the goal state of those components, and the global goal of the 
systems, can be cast in terms of maximising the respective SIs. 

In summary, the CloudLightning framework defines a number of 
mechanisms as follows: 


e A mechanism to communicate Impetus, through the transmission of 
weights, from a level in the hierarchy to the level below. This mecha- 
nism allows a component, higher in the hierarchy, to steer the evolu- 
tion of components immediately below them in the hierarchy. 

e A mechanism to allow components to communicate Feedback, 
through the transmission of metrics, to components in the next level 
up in the hierarchy. 

e A mechanism to modify the behaviour of components in response to 
Impetus and Feedback. 

e Mechanisms to allow components to make decisions in accordance 
with various strategies to maximise their individual Sls. 

e Mechanisms to allow components at the same level in the hierarchy 
to cooperate with each other in accordance with various strategies to 
maximise collective and/or individual Sls. 


All of these concepts, and their interactions, are visualised in Fig. 3.1. 

The CloudLightning framework provides these mechanisms to enable 
the SOSM strategies being deployed and performed by individual compo- 
nents to move nearer to their goal state. Within this framework, each com- 
ponent can make local decisions in accordance with various SOSM 
strategies based on its current state (from the feedback loops) and imposed 
Impetus (from the directed evolution processes), maximising its SL 
Overall, self-management is implemented at a system level, allowing the 
whole system to evolving towards its business /system objectives. 
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3.3 AUGMENTING THE CLOUDLIGHTNING 
ARCHITECTURE 


The CloudLightning architecture is initially augmented to include explicit 
entry points to the vRack Manager Groups. It can be seen from previous 
Chapter that these groups partition the resource space into different types 
of CL-Resources. This partitioning speeds up resource selection, since at 
most one CL-Resource type can be returned by the CloudLightning sys- 
tem for each service. The entry points into the differently typed vRack 
Manager Groups add an additional component to the CloudLightning 
architecture. Because of its routing characteristics described above, this 
component is called a pRouter. Figure 3.2 depicts this component in the 
augmented architecture. 

From Fig. 3.2, it can be seen that there is an entry point into each 
vRack Manager Group, of the same CL-Resource type, hanging from 
each pRouter. These partition the space into smaller sets of CL-Resources 
of the same type. These entry points add yet another component to the 


SELF-ORGANISING, SELF-MANAGING FRAMEWORKS AND STRATEGIES 67 


Cell 
Manager 


pRouter 


€  @ x 
O O a oe 
aaa WW : ¿E ae A e * at "ç z ud al AN Pl t A SS Bu i 

ack Manage / ' ae ! ; 
vRack Manager ^, ** Š ad J ‘i se ye A Ts N E Rack Manager 
a - re, “nae w we ¿e Sá + De x 
x = a da - Group 
vRack Manager Group of vRack Manager Group of vRack Manager Group of 
CL-Resource type1 CL-Resource type 2 CL-Resource type n 


Fig. 3.2 Augmented CloudLightning architecture to include pRouters 


CloudLightning architecture. Because this component connects all 
vRack Managers in the same group, it acts as a switch and is called a 
pSwitch. Figure 3.3 depicts this component in the augmented 
architecture. 

It can be seen that the final augmented architecture forms a tree struc- 
ture in which the root node corresponds to the Cell. The children of the 
Cell are pRouters, and there is at least one pRouter for each distinct 
CL-Resource type. The children of a pRouter are pSwitches. pSwitches 
partition the Virtual Rack Managers (vRMs), managing the same 
CL-Resource type, into groups. The number of pSwitches per pRouter is 
not fixed over time, neither is the size of the vRM groups managed by 
each pSwitch. In the following sections and chapters of this deliverable, it 
will be seen that pSwitches and vRMs can self-organise within groups, 
which are called Cooperatives, to emphasise their self-organising nature. 
To prohibit the creation of Cooperatives with different CL-Resource 
types, pSwitch Cooperatives cannot span pRouters. Similarly, to minimise 
administrative overhead and to simplify coalition formation, vRM 
Cooperatives (formerly called vRack Manager Groups) cannot span 
pSwitches. 
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As the CloudLightning system evolves, it is anticipated that the number 
of pSwitches connected to a pRouter will change and will converge to 
some optimal number with respect to the global goal. This goal is derived 
from the Directed Evolution coming from the pRouter and from the 
pSwitch’s efforts to achieve its local goal state. As part of the self- 
organisation process, pSwitches can be created, destroyed, merged, and 
split. In addition, pSwitches, within the same Cooperative, may exchange 
vRMs to optimise management. Together, the pRouters and the pSwitches 
form a reconfigurable and self-optimising switching fabric. 

Similarly, it is anticipated that the number of vRMs connected to a 
pSwitch will change and will converge to some optimal number derived 
from the Directed Evolution coming from the pSwitch and from the 
vRM’s efforts to achieve its local goal state. As part of the self-organisation 
process, vRMs can be created, destroyed, merged, and split. In addition, 
vRMs, within the same Cooperative, may exchange CL-Resources in an 
effort to maximise CL-Resource utilisation, minimise energy consump- 
tion, and facilitate coalition formation and management optimisation. 

An important driving force behind the evolution of the CloudLightning 
system is the sequence of services/tasks that the system is required to 
execute. From the previous chapter, it can be seen that the process of 
maintaining a separation between resource and service life-cycles involves 
using the CloudLightning system to autonomously locate appropriate 
resources to execute each specific service/task. As part of this process, a 
description of these resources is passed to the CloudLightning system in 
an attempt to match appropriate resources with the service/task request. 
The term resource prescription (subsequently referred to simply as pre- 
scription) is introduced to refer to this description, and hence the pRouter 
is a prescription Router and the pSwitch is a prescription Switch. 

vRMs form the lowest software level in the hierarchical organisation of 
the CloudLightning system. The next level up in this hierarchy is formed 
by grouping vRMs of the same type into Cooperatives. The elements of 
the Cooperatives, that is, its vRMs, self-organise by exchanging 
CL-Resources appropriately, to enable optimal management. Similarly, 
the elements of the pSwitch level self-organise by exchanging vRMs appro- 
priately to enable optimal management. Finally, the elements of the 
pRouter level, that is, groups of pSwitches, self-organise by exchanging 
pSwitches appropriately to enable optimal management. All of these self- 
organising actions take place simultaneously resulting in the emergence of 
pathways through the hierarchy designed to optimise the ongoing propa- 
gation of resource prescriptions through the system. 
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3.4  SELF-ORGANISATION AND SELF- MANAGEMENT 
IN CLOUDLIGHTNING ARCHITECTURE 


The general SOSM framework is mapped to the augmented hierarchical 
CloudLightning architecture outlined in the previous chapter. In the 
CloudLightning architecture, the autonomous components are the Cell, 
the pRouters, the pSwitches, and the vRMs. This framework provides 
Directed Evolution, self-management, and self-organisation mechanisms. 


3.4.1 Directed Evolution 


Directed Evolution is a mechanism to communicate a changing force 
throughout the system in a manner which effectively allows a component, 
higher in the hierarchy, to steer the evolution of the components immedi- 
ately below them. 


3.4.1.1 The Goal State 
The goal of each component at all levels in the hierarchy is to maximise its 
SI. 

The SI, 7, is defined to be a combination of the Impetus and Perception 
expressed through a function n (P, I ) A such that 


I eR“ PeR” > n (7, P) e R, where Nis the number of parameters used 
to express Impetus and Perception. 

Note that, in the Cell the SI is calculated per resource type. 

The goal state for the pRouter and the pSwitch is: 


arg max n (T (w), P(m)), w, me R” (3.1) 


where w is an N-dimensional vector of weights corresponding to the 
Impetus and m is an N-dimensional vector of metrics obtained from the 
lower levels. Equivalently the goal state for the vRM is: 


arg max n (1(55), P(d)), w, d e R" (3.2) 


where w is an N-dimensional vector of weights corresponding to the 
Impetus and d is an M-dimensional vector of metrics obtained from the 
Telemetry service. 
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3.4.1.2 Cell State 
The Cell state is a set of vector tuples and function tuples of the form: 
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where is the number of different pRouter types and w is the weight 
calculated by the Cell to effect steering. The tuple (w, m,) represents met- 
rics and weights of the ¿-th pRouter, respectively, where we R“, m, e R”. 
The function tuple ( H, @,) is used to calculate the Impetus and Perception 
vectors, respectively, for each CL-Resource type maintained by each 
pRouter. 

Since the Cell is at the highest level in the hierarchy, weights may be 
determined by the flow of tasks into the system and/or by local decisions 
made in an effort to move towards an objective goal state. 


3.4.1.3 pRouter State and pSwitch State 

The pRouter and pSwitch states can be described as a vector tuple (w, m), 
representing weights and metrics where we R“, me R“, and a function 
tuple (i, ë) is used to calculate Impetus and Perception, respectively. 


3.4.1.4 vRM State 

vRM state can be described as a vector tuple (w, d ), representing weights 
and metrics where we R“, de R“, and a function tuple (i, ë) is used to 
calculate Impetus and Perception, respectively. 


3.4.1.5 Steering by the Cell 

There are at least two mechanisms for specifying a global goal state, G. The 
first is an objective goal specified to meet a specific business case. This can 
be set in a Cell, and in conjunction with the current local state of that Cell, 
adjustments can be made to the weights and applied to the underlying 
pRouters to steer them in that direction. By responding to this Impetus 
appropriately, the system will tend towards the goal state: 


Leen =H Tos > Ga > T) (3.4) 


where Í, is the current Impetus of the Cell, Fe is the new Impetus of 


the Cell, Geu is the goal state of the Cell, and T; are resource 
prescriptions. 
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Alternatively, the global goal state of the system can be expressed as a 
maximisation of the local goal state of the Cell. That is: 


arg max n, (T, P), i=1, n, 1, PeR“ (3.5) 


where y; is the suitability of ¿-th pRouter attached to the Cell. 


3.4.1.6 Steering by the pRouter 
Steering by a pRouter is a mechanism for calculating and transmitting an 
Impetus to its attached pSwitches: 

Impetus is a function such that: 


Draco ln JE ER Tog eR (3.6) 


pRouter pRouter ? pRouter 


where T proue 18 the previous Impetus of the pRouter. Here To, represents 


the weight coming from the Cell. 


3.4.1.7 Steering by the pSwitch 
Steering by a pSwitch is a mechanism for calculating and transmitting an 
Impetus to its attached vRMs: 


7 T 7 T N N 
Í Switch = H (T saa ° L Router ) > L switch € R > L Router € R (3.7) 


where I ix is the previous Impetus of the pSwitch. Here 7 


sents the weight coming from the pRouter. 


pRouter repre š 


34.2 Self-Management Mechanisms 


The self-managing components in the system include (a) pRouters and 
pSwitches, managing prescription routing, metrics, and weights; and (b) 
vRMs, managing task execution, metrics, weights, and CL-Resources. 


3.4.2.1 Mechanism to Send Metrics from a vRM to pSwitch 

A separate assessment function corresponding to one of N metrics is exe- 
cuted in each vRM, and the result is passed as an N-dimensional vector to 
the respective pSwitch associated with that vRM. 
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3.4.2.2 Mechanism to Send Metrics from a pSwitch to pRouter 

A number of N-dimensional vectors will arrive at a pSwitch (one from 
each vRM in the cooperative defined by that pSwitch), and each of these 
is combined to derive a new N-dimensional vector. This represents the 
pSwitch’s Perception of the suitability of the underlying vRM cooperative 
to accept new tasks. This Perception can be customised by choosing the 
specific manner in which the input N-dimensional vectors are combined. 
The resulting N-dimensional vector is passed to the pSwitch’s pRouter. 


3.4.2.3 Mechanism to Send Metrics from pRouter to Cell 

A number of N-dimensional vectors will arrive at a pRouter (one from 
each pSwitch in the cooperative defined by that pRouter), and each of 
these is once again combined to derive an N-dimensional vector repre- 
senting the local state of that pRouter. This state can be viewed as being 
the pRouters Perception of the suitability ofthe underlying pSwitch coop- 
erative to accept new tasks. This perception can also be customised by 
choosing the specific manner in which the input N-dimensional vectors 
are combined. This N-dimensional vector is passed to the Cell. 


3.4.2.4 Mechanism to Send Weights from Cell to pRouters 

Weights sent from a level in the hierarchy to a lower level represent the 
desire of the transmitting level to evolve in a particular direction. Since the 
Cell is at the highest level in the hierarchy, the sending of weights to the 
pRouters is the first step in the process of Directed Evolution. There are 
many strategies that the Cell can employ to determine how these weights 
change from time to time in the CloudLightning system. In all cases, these 
weights are sent to each pRouter as an N-dimensional vector representing 
the desired/calculated change to the progression of the Directed 
Evolution. 


3.4.2.5 Mechanism to Send Weights from pRouters to pSwitches 

After receiving an updated N-dimensional vector from the Cell, a pRouter 
will transform it using a customizable function, which will dictate the rate 
at which the next level down in the hierarchy is expected to change. This 
transformed N-dimensional vector is passed to the underlying pSwitches. 


3.4.2.6 Mechanism to Send Weights from pSwitch to vRMs 
After receiving an updated N-dimensional vector from the pRouter, a 
pSwitch will transform it using a customizable function, which will dictate 
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the rate at which the next level down in the hierarchy is expected to 
change. This transformed N-dimensional vector is passed to the underly- 
ing vRMs. 

The same weights are propagated to every component in the same level 
(in the same pRouter). This ensures that the underlying level does not 
return metrics that cannot be meaningfully compared at that level. For 
example, if the weights associated with the calculations of power efficiency 
in two different servers of the same type are grossly different, one will 
appear to be more power efficient than the other even if both are equally 
power efficient. 

Figure 3.4 depicts an example propagation of weights and metrics 
through the CL hierarchy in eight distinct time-steps. These vectors are 
propagated asynchronously from level to level. The metrics originate at 
the bottom level of the hierarchy, where they are derived from the appli- 
cation of CL-specific assessment functions applied to data gathered from 
the resource monitor. As they travel up through the hierarchy, they are 
aggregated to give successive perceptions of the underlying system at 
each successive component. The propagation of weights begins at the 
Cell and is modified as they are passed down through the hierarchy to 
reflect successive inflections of the Impetus coming from the Directed 
Evolution. 


3.4.2.7 A Mechanism in the Cell to Modify Local Behaviour 
in an Effort to Respond to Impetus Provided by the Directed 
Evolution and Metrics Coming from Attached pRouters 
Perception is a function such that: 


Poo = 0 (M,, Ma, ...,m,), MER”, m, eR”, ..., Mm eR` 8.8) 


Here, each m, is a metric (an N-dimensional vector) coming from each 
of the r pRouters attached to the Cell. 

Impetus Ten = u(T,), where T; is the task prescription under 
consideration. 
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Fig. 3.4 An example propagation of weights and metrics through the CL hierar- 
chy, with respect to a resource prescription 
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3.4.2.8 A Mechanism in a pRouter to Modify Local Behaviour 
in an Effort to Respond to Impetus Transmitted by the Cell 
and Metrics Coming from Attached pSwitches 

Perception is a function such that: 


= - > = = N >= N = N 
P router =9(m,, Ma... , m, ), mek m ER, os m eR (3.9) 


Here, each m, is a metric (an N-dimensional vector) coming from each 
of the s pSwitches attached to the pRouter. 
Impetus is a function such that: 


L Router = H as > Leen ) > L Router € 


ER” (3.10) 


where T »rouer 18 the previous Impetus of the pRouter. Here IL 


sents the weight coming from the Cell. 


cen TCpre- 


3.4.2.9 A Mechanism in a pSwitch to Modify Local Behaviour 
in an Effort to Respond to Impetus Transmitted by its pRouter 
and Metrics Coming from Attached vR Ms 

Perception is a function such that: 


P switch = @ (MM) My, ... ,M,), m ER" m, ER, ..., im, ER” (3.11) 


Here, each m, is a metric (an N-dimensional vector) coming from each 
of the v VRMs attached to the pSwitch. 
Impetus is a function such that: 


ER”, I ER” (3.12) 


Í switch = H (T saa 2 T Router ) * T switch pRouter 


where on is the previous Impetus of the pSwitch. Here 7 


sents the weight coming from the pRouter. 


repre- 


pRouter 
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3.4.2.10 A Mechanism in a vRM to Modify Local Behaviour 
in an Effort to Respond to Impetus Transmitted by its pSwitch 
and Metrics Coming from its vRack 

Perception is a function such that: 


Peu =m=y(d), de R" (3.13) 


vi 


where d represents an M-dimensional Telemetry data obtained from the 
Telemetry service running on the physical resources belonging to the 
associated vRack. 

Impetus is a function such that: 


s P A A a (3.14) 


vRM? ` pSwitch v pSwitch 


where Tpu is the previous Impetus of the vRM. Here 7 
the weight coming from the pSwitch. 


pSwitch represe nts 


3.4.2.11 Sample Events that Trigger the Transmission of Metrics at each 
Level in the Hierarchy 
Options: 


e Periodically, at a rate suitable for that level in the hierarchy 
e From the vRM to the pSwitch: 


— After the receipt of a task prescription 

— When resources are freed 

— Asa result of a self-organisation activity 

Periodically to reflect utilisation, power consumption, and other 
low-level metrics of interest 


3.4.2.12 Sample Events that Trigger the Transmission of Weights at Each 
Level in the Hierarchy 
Options: 


e Asa result of steering 
e Periodically, at a rate appropriate for each level in the hierarchy 
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343 Self-Organisation Mechanisms 


vRMs self-organise within the same pSwitch to optimally manage 
CL-Resources and to satisfy resource prescriptions, thus, maximising their 
SI and evolving towards the local goal state. Similarly, pSwitches can self- 
organise within the same pRouter to maximise their SI to identify those 
parts of the system that are evolving towards their local goals. In principle, 
pRouters of the same CL-Resource type can also self-organise; however, 
that level of re-organisation is not considered further here since the added 
advantages are thought to be minimal. One example of Self-organisation 
scenarios can be described as follows. 


Within the vR Ms 


l. A task comes into the pSwitch. 

2. The pSwitch sends the task to an attached vVRM with the highest 
SI. 

3. The vRM checks to see if it has sufficient resources to execute the 
task. 


(a) Ifyes, no problem. 
(b) If no, the vRM initialises a self-organisation event within its 
cooperative. 


4. The vRMs send updated metrics with their pSwitch. 
Within the pSwitches 


l. A task comes into the pRouter. 

2. The pRouter sends the task to an attached pSwitch with the high- 
est SI. 

3. The pSwitch checks to see if there are sufficient resources to exe- 
cute the task. 


(a) Ifyes, it passes the task to the vRM with the highest SI. 
(b) If no, the pSwitch initialises a self-organisation event within 
its CO- operative. 


4. The pSwitch sends updated metrics to its pRouter. 
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Within the pRouter 


l. A task comes into the Cell. 
2. The Cell sends the task to an attached pRouter with the highest SI 


of the desired type. 
3. The pRouter checks to see if there are sufficient resources to exe- 


cute the task. 


(a) If yes, passes the task to the pSwitch with the highest SI. 
(b) If no, the pRouter initialises a self-organisation event within 
its CO- operative. 


4. The pRouter sends updated metrics to the Cell. 
Sample events that trigger re-organisation at each level in the hierarchy 


e When weights are updated. 

e Asaresult ofan autonomous, periodic, housekeeping action designed 
to maximise the SI of the initiating component. 

e After the arrival of a resource prescription that cannot be satisfied 
without re-organisation. 


When all else fails: sample resource prescription rejection strategies 


e Outright reject. 

e Return prescription to the previous level and possibly trigger a re- 
organisation there. 

e Recycle the task prescription into the system at the Cell level and 
record its recycle iterations until an upper limit is reached. If this 
limit is reached, reject. 


3.5 CLOUDLIGHTNING SOSM STRATEGIES 


3.5.1 Self-Management Strategies 


In the CloudLightning SOSM framework, each component is autono- 
mous, which allows the component using different self-management strat- 
egies accordingly to achieve its local goal state. 
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Some self-management strategies may include: 


e Static weights and dynamic weights (only for Cell Manager) 

e Average aggregation (suitable for pRouters, pSwitches, and vRMs) 

e Modifying weights for smoothing changes towards local goal state 
(suitable for pRouters, pSwitches, and vRMs) 

e° Bin-packing for energy efficiency (only for VRMs) 

e Functions for management efficiency (only for vRMs) 

e Isotropy preservation for task process parallelism (only for vRMs) 


3.5.1.1 An Example Self-Management Scenario 

Here, an example of examining the effect of different choices of manage- 
ment cost functions is presented. Four different functions are selected for 
inspection, characterising different types of evolution, which are described 
by the equations that follow. 


(a) Small vRacks 


i= Í ete (3.15) 


Equation 3.15 favours small capacity vRacks enabling them to evolve; 
while when a vRack has large capacity, the output of the management cost 
function approaches zero resulting in a reduced SI. Thus, large vRacks are 
not capable of undertaking more requests, and they have to transfer their 
servers to other smaller vRacks in order to slowly achieve the ideal size. 


(b) Large vRacks 


2(2 at Nora) 
N oral =2 


= ett (3.16) 


Equation 3.16 favours large capacity vRacks; when a vRack has small 
capacity, the output of the management cost function approaches zero 
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resulting in a reduced SI. Thus, small vRacks are not capable of undertak- 
ing more requests, and they have to transfer their servers to other larger 
vRacks merging with them. 


(c) Medium vRacks 


2 
{sta} 
N, total 


e — (3.17) 


Equation 3.17 favours medium capacity vRacks; when a vRack has very 
small or very large capacity, the output of the management cost function 
approaches zero resulting in a reduced SI. Thus, very small and very large 
vRacks are not capable of undertaking more requests, and they have to 
transfer their servers or merge with other vRacks. 


(d) Extreme vRacks 


2 
oras] 


total 


l-e 2 (3.18) 


Equation 3.18 favours very small capacity or very large capacity 
vRacks; when a vRack has medium capacity, the output of the manage- 
ment cost function approaches zero resulting in a reduced SI. Thus, 
medium capacity vRacks are not capable of undertaking more requests, 
and they have to transfer their servers or merge with other smaller or 
larger vRacks. 

Overall, the optimal number of servers per vRack is given by 


A 1 
Nora = ae bial ),. This number is dynamic and is changing with the 
v i=l 


creation / destruction of vRacks or with the merging/splitting of vRacks. 
The management cost functions can be depicted schematically by (a), (b), 
(c), and (d) in Fig. 3.5. 

However, the choice of management cost function significantly affects 
the evolution as well as other parameters and metrics of the systems such 
as utilisation and number of rejected resource prescriptions. 
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Fig. 3.5 Different types of management cost functions 


3.5.2 Self-Organisation Strategies 


The self-organising components in the system include vRMs and 
pSwitches. vRMs self-organise within the same pSwitch to optimally 
manage CL-Resources and to satisfy resource prescriptions, thus maxi- 
mising their SI and evolving towards the local goal state. Similarly, 
pSwitches can self-organise within the same pRouter to maximise their 
SI to identify those parts of the system that are evolving towards their 
local goals. In principle, pRouters of the same CL-Resource type can 
also self-organise; however, that level of re-organisation is not consid- 
ered further in this book since the added advantages are thought to be 
minimal. 
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Some self-organisation strategies may include: 


° Dominate: the component with the greater SI has precedence and 
can demand another component of the same type, but with a lower 
SI, to transfer some resources. 

e Win-Win: components may cooperate to exchange resources to 
maximise the SI of each. 

e Least Disruptive: minimise disruption with respect to management 
and administration. 

e Balanced: maximise load-balancing among each cooperating 
component. 

e Best Fit: minimise server fragmentation and/or minimise network 
latency (this strategy may come from some vRM-specific 
objectives). 

e Any meaningful combination of the above. 


3.5.2.1 An Example Self-Organisation Scenario 

An example of a Least Disruptive algorithm that can be used by vRMs for 
self-organisation is presented. This algorithm can be used by vRMs to 
exchange resources to minimise their management cost. This algorithm 
has two steps: the first function endeavours each vRM to minimise the 
number of administrative actions, and the second function is taking virtu- 
alisation and fragmentation into account, which can be used to avoid the 
creation of very large vVRMs for management efficiency purpose. This two- 
stage self-organising scheme can be described by the algorithmic proce- 
dure given by the following algorithm. 
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Algorithm 1 


Let P be the minimum number of vRacks allowed per pSwitch 
Let J be the index of the vRack with maximum Suitability Index 
Let rp be a resource prescription arriving to vRM; 
Let Pj be the set of free resources belonging to vRM; 
function MINADMINCOSTS(rp) 
a=0 
t=0 
if pj < Tp then 
required = rp-|p;| 
for i € lto N, with i + jdo 
send request to acquire free resources from vRM; 
receive p; from vRM; 
a = auU{i} 
t = tUp; 
if required < 0 then 
remove exceeding resources from t 
required = 0 
break 
send request to vRMs in a to acquire resources int 
receive resource handlers from vRMs in a 
if |p;| > then 
return resource handlers to Gateway Service 
else 
create new vRM, with resources p;Ut 
return resource handlers to Gateway Service 
function TWOSTAGESO(YD) 
if MINADMINCOSTS(YP) does not return resource handlers then 
if |p,| < Ñ and N, > pthen 
fori + lto N, with i + ¡do 
Probe ¿th VRM for resources, so that p;Up; can service rp 
if pj Up; can service rp then 
Merge VRM; with VRM, 
return resource handler from resulting vRM to Gateway Service 


rejection to Gateway Service 


else 


return resource handlers obtained from MINADMINCOSTS(rp) to Gateway Service 
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Figure 3.6 presents the increased system utilisation and requests reject 
rate of this two-stage self-organisation algorithm merging with the mini- 
mum free resources. However, because the system accommodates larger 
tasks through merging, the smaller tasks arriving at the system are con- 
tinuously rejected due to lack of resources. 

In the case of merging with the vRack with maximum free resources, 
the utilisation of the system, depicted in Fig. 3.7a, is slightly increased but 
oscillates around 80%. As a consequence, the percentage of rejected 
requests increases, since the system is accommodating an increased num- 
ber of larger requests, as schematically represented in Fig. 3.7b. 

Overall, this two-stage self-organisation strategy has been employed for 
enhancing utilisation and reducing fragmentation with virtualisation in 
mind. 


3.6 (CONCLUSION 


The SOSM framework described in this chapter provides a general and 
scalable mechanism for hosting and executing SOSM strategies that, in 
principle, could be associated with any hierarchical architecture. 

The key elements of the self-management and self-organisation frame- 
work include the process of Directed Evolution; an Impetus that drives the 
evolutionary process at all levels in the hierarchy; a Perception, associated 
with each component, indicating the effectiveness of the system underlying 
that component; and an SI, associated with each component, that deter- 
mines how close that component is to achieving its goal state. Specifying 
an objective global goal state may be based on business decisions and/or 
technology constraints, however, to optimise the CloudLightning system 
in its entirety; it is suggested that the goal states for components of the 
system should be chosen to maximise their respective Sls. 

This approach introduces a great deal of flexibility into the evolution of 
a system by allowing it to achieve stasis while attempting to balance local 
constraints with the external Impetus derived from the directed evolution- 
ary process. Over time, the system as a whole evolves to optimise typical 
service usage, to achieve the dynamic equilibrium. The local constraints 
are most evident at the vRM level where they are embodied in assessment 
functions capturing the essential characteristics ofthe underlying resources. 

The framework endows the system being specified with the flexibility to 
extend the resource fabric in a seamless fashion. This elegantly addresses 
the CloudLightning objective of readily supporting heterogeneous hard- 
ware now and into the future. 
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Fig. 3.6 The system utilisation (a) and requests reject rate (b) of two-stage self- 
organisation algorithm merging with the minimum free resources (p = 3) 
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Fig. 3.7 The system utilisation (a) and requests reject rate (b) of two-stage self- 
organisation algorithm merging with the maximum free resources (p = 3) 
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CHAPTER 4 


Application Blueprints and Service 
Description 


loan Dragan, Teodor-Florin Fortis, Marian Neagul, 


Dana Petcu, Teodora Selea, and Adrian Spataru 


Abstract In the context of creating a self-organising and self-managing 
cloud infrastructure we propose a set of extensions to the existing Service 
Description Languages (SDLs) and Application Blueprints in order to 
establish a common ground for the various CloudLightning components. 
By implementing this SDL and all the missing links one can assure that the 
CloudLightning system works in such a way that users can easily interact 
with it. In this chapter we present in detail the design decisions that were 
made during the development of various components alongside with their 
formal description. 
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4.1 INTRODUCTION 


To deliver the quality of service (QoS) expected by end users on a distrib- 
uted multi-tenant infrastructure requires careful management of comput- 
ing resources. This is particularly the case where there is a rapid growth in 
usage such as cloud computing. Cloud service providers (CSPs) are faced 
with a myriad of challenges in meeting the needs of a large and diverse 
range of end users including, but not limited to, service transparency, 
automated service provisioning, efficiently managing workload segmenta- 
tion and portability, and managing virtual services instances at one level, 
while optimising the utilisation of all resources at a different level (Sun 
et al. 2012). The issues can be resolved through specialised and precise 
cloud service specification models, Service Description Languages (SDLs), 
describing cloud services, their deployment specifications, and the required 
resources to run these cloud services. The majority of the existing SDLs 
and associated frameworks implement tools, Application Programming 
Interfaces (APIs), and strategies for managing the lifecycle of cloud appli- 
cations and/or resources, and they are usually provided as a self-service 
interface to Enterprise Application Operators (EAOs). This self-service 
approach allows an EAO to have full control over the management of 
applications as well as the underlying resources such as virtual machines 
(VMs) and containers. It subsequently narrows down the opportunities 
for CSPs to improve resource utilisation and potentially the quality of 
services. 

The CloudLightning architecture endeavours to create a service- 
oriented architecture for the evolving heterogeneous cloud. In this respect, 
it is imperative to maintain a separation between application lifecycle man- 
agement and resource management. This separation of concerns imple- 
ments a “what-how” approach where the user concentrates on “what” 
needs to be done, while the CSP concentrates on “how” it should be 
done. With such an approach, it will be possible to implement continuous 
improvements, in terms of resource utilisation and service delivery, at the 
resource level. From this perspective, SDLs facilitate both (a) application 
lifecycle management by the user and (b) resource management by the 
CSP. As such, they ensure a proper separation of concerns between stake- 
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holders, a core design principle of CloudLightning introduced in Chap. 1. 
Particular service offerings are captured in blueprints to assist end users to 
discover and select from an increasing catalogue of services and determine 
an optimal, and potentially heterogeneous, set of resources to implement 
them. The remainder of this chapter is organised as follows. The next sec- 
tion provides an overview of two representative application lifecycle frame- 
works and one representative resource management framework. This is 
followed by an overview of the specific stakeholders whose concerns are of 
interest to CloudLightning. The CloudLightning approach to separation 
of concerns is then described followed by the Gateway Service and its 
functionalities. Formal definition of the CloudLightning Service 
Description Language (CL-SDL) is provided in Sect. 4.4 followed by an 
exemplar implementation. This chapter concludes with a summary and 
future work on the components and concepts presented in the chapter. 


4.2 REPRESENTATIVE APPLICATION LIFECYCLE 
AND RESOURCE MANAGEMENT FRAMEWORKS 


In order to identify concerns about the classical, vertical management 
approach to cloud computing application lifecycle and resource manage- 
ment, three representative frameworks are used for illustrative purposes: 
OpenStack Solum, Apache Brooklyn, and OpenStack Heat. 

The cloud application lifecycle management architecture is represented 
in Fig. 4.1, using OpenStack Solum and Apache Brooklyn frameworks for 
Platform as a Service (PaaS) cloud, and resource lifecycle management 
using OpenStack Heat mainly for Infrastructure as a Service (IaaS) cloud. 

Project Solum and Apache Brooklyn allow the user to deploy a cloud 
application or a group of cloud applications previously described in a blue- 
print, using an SDL. The main purpose of such an SDL is to provide a way 
of expressing the management processes for cloud applications. Depending 
on the actual implementations, this may include providing the ability for 
describing the characteristics of the application components, deployments 
scripting, dependencies, locations, logging, policies, and so on. 

In the case of OpenStack Solum, the engine takes a blueprint as an 
input and converts it to a Heat Orchestration Template (HOT) that can 
be understood by the application and resource management engine 
(OpenStack Heat). The Heat engine, thereafter, calls the corresponding 
service APIs that are offered by the cloud infrastructure framework such 
as OpenStack. 
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Fig. 4.1 Lifecycle management for OpenStack Solum, Apace Brooklyn, and 


OpenStack Heat 


In contrast, Apache Brooklyn converts a blueprint into a series of API 
calls (specifically, ¡Cloud APIs) that can be used to directly contact the 
underlying cloud infrastructure. For example, these calls may reach the 
cloud infrastructure with a request for creating a VM in OpenStack; the 
OpenStack Nova API service will capture the request and send it to nova- 
scheduler, which, in turn, decides on the physical server on which the VM 
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should be started on. This approach is based on a request-response pat- 
tern, providing a simple, robust, and efficient implementation. However, 
as each request is processed independently, when blueprints are specifying, 
for example, placement constraints based on vicinity of resources, such a 
constraint is hard to be captured and fully implemented by APIs with a 
vertical approach. 


4.3 CLOUDLIGHTNING STAKEHOLDERS AND ASSOCIATED 
CONCERNS 


Separation of concerns requires the identification of stakeholders and their 
associated concerns. For illustrative purposes, three distinct entities are 
identified—end users, Enterprise Application Operators and Developers 
(EAO/EAD), and laaS resource providers (CSPs) each with differing 
concerns. The end user is the consumer of an application and/or service. 
As such, their concerns are primarily related to cloud application continu- 
ity, availability, performance, security, and business logic correctness. The 
EAO/EAD has traditional enterprise concerns, for example, cloud appli- 
cation configuration management, performance, load balancing, security, 
availability, and the deployment environment. As discussed in Chap. 1, the 
CSP”s business model is driven by cost effectiveness and scalability while at 
the same time delivering the contracted service level. As such, their con- 
cerns are primarily related to optimisation including resource availability, 
operating costs (including power consumption), resource provisioning, 
resource organisation, and partitioning (if applicable). 

Under separation of concerns, each entity manages their own concerns, 
to the extent that they can. Notwithstanding this, some concerns exist 
across the entities. For example, in order to realise high availability, an 
EAO may need to configure a load-balancer, while at the same time a CSP 
must implement a host-affinity policy. 


4.4 ‘THE CLOUDLIGHTNING APPROACH BASED 
ON SEPARATION OF CONCERNS 


4.4.1 CloudLightning Requirements 


As discussed, the CloudLightning service delivery model depicted in 
Fig. 4.2 is a blueprint-based one. In contrast to existing frameworks, this 
service delivery model provides facilities for blueprint developers to specify 
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Fig. 4.2 CloudLightning service delivery model 


comprehensive constraints and quality of service parameters for services 
and/or resources in the scope of a blueprint, by means of a specific SDL 
(the CL-SDL). Based on the specified constraints and parameters, it is 
then possible to provide an initial optimal deployment of the resources, a 
capability which has not been accomplished by previous solutions: for 
example, by placing resources (such as VMs) on the adjacent physical serv- 
ers to minimise communication delay or allocating containers that have 
Graphical Processing Units (GPUs) or Xeon Phis attached to them to bal- 
ance between performance and cost. 

More importantly, in order to separate the concerns of cloud applica- 
tion lifecycle management and the resource lifecycle management, a 
CloudLightning-specific blueprint (CL-Blueprint) must be decomposed 
into two separate and interrelated blueprints, the first one for resource 
management (offering the Resource Template) and the other one for 
application/workflow management (defining  framework-specific 
templates). This process is shown in Fig. 4.3. It also implies that the 


APPLICATION BLUEPRINTS AND SERVICE DESCRIPTION 95 


juawAojdag 104 sanbey 


AJDAMIP HMS SUTUIYSITPNOTD 103 SINIDIJDIY ç` “BIL 


96 I. DRAGAN ET AL. 


CL-SDL shall be developed in such a way that a CL-Blueprint described 
in the CL-SDL can be transformed to framework-specific blueprints 
without losing generality. 

A CL-Blueprint deployment starts from sending the raw Resource 
Template to a Resource Discovery component and a Resource Selection 
component, which are the two main components of a complementary 
system (in this situation, the CloudLightning Self-Organising and Self- 
Management [SOSM] framework), for optimal resource identification in 
the scope of a blueprint, as indicated in Fig. 4.3. Once the optimal resource 
identification process has finished, the initially received Resource Template 
must be reconstructed in order to embed the received resource optimisa- 
tion information and consequently send it to the resource lifecycle man- 
agement engine, which will carry out the actual resource deployment on 
the infrastructure it manages. 

In addition, some of the optimisation information (e.g., on which 
physical server should this VM be allocated) must be embedded into 
resource requests (API calls), and this special information must be cap- 
tured by the lower infrastructure management components. 

The returns from the deployment process are the resource handlers 
(e.g., a resource handler can be a login account with username, access key, 
and Internet Protocol address to a VM, a container, a bare metal machine 
with pre-installed operating system, or an existing High Performance 
Computing [HPC] cluster). These resource handlers will then be returned 
to the Gateway Service, which will reformulate the original workflow/ 
application blueprint along with the resource handlers. 

The newly formulated workflow/application blueprint will then be 
submitted to the corresponding workflow/application lifecycle manage- 
ment framework to carry out the deployment of the cloud applications on 
these pre-provisioned resources. This process is shown in Fig. 4.3. To this 
end, a CL-Blueprint deployment process is complete. 

Notice that this service delivery model is much more sophisticated 
when compared to the current self-service model using a vertical 
management approach, as the cloud application management and the 
resource management operate independently. Moreover, the cloud appli- 
cation management layer constantly needs to exchange information with 
resource management layers in certain circumstances (e.g., when ending 
the lifetime of a CL-Blueprint, a notification needs to be sent to the 
resource management layer so that the underlying resources can be reused 
or decommissioned). 
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In order to align with the design of the bespoke service delivery model, 
and implement the separation of concerns, the specific SDL shall be devel- 
oped with following capabilities: 


e m = ow 


To describe characteristics of a cloud application 

To describe cloud application execution environment and 
dependencies 

To specify cloud application deployment processes 

To specify resource type and resource requirements 

To express constraints between blueprint service elements 

To express quality of service parameters for each individual blueprint 
service element 

To accommodate extensions for supporting specific/non-traditional 
cloud applications such as HPC applications 


. To fulfil above requirements without losing generality 


4.4.2 Separation of Concerns 


During the lifetime of the CL-Blueprint, the EADs/EAOs are responsible 
for managing the cloud applications through specific frameworks, such as 
Apache Brooklyn and OpenStack Solum, while the CloudLightning 
SOSM system manages the underlying resources. A series of advantages of 
this approach may be then highlighted: 


continuous improvement on quality of CL-Blueprint services 
improving service delivery and user experience by reusing resources 
that have already been provisioned 

resource optimisations and energy efficiency optimisation 

flexible and extensible when integrating other management system 
such as the OpenStack Mistral (Openstack.org 2017) workflow 
management system 


In CloudLightning, the functional components that realise the concept 
of the “separation of concerns” are shown in Fig. 4.4 with the following 
description. 


44.2.1 Application Lifecycle Management 


Abstract Blueprint. used to represent specific application 
requirements, constraints, and metrics defined by users, and describe 
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Fig.4.4 CloudLightning implementation of the “separation of concerns” 


the concrete and abstract services (referenced only by identification) 
alongside with the collocation of the services. 

° Blueprint: represents a fully qualified Cloud Application Management 
for Platforms (CAMP) (Organization for the Advancement of 
Structured Information Standards [OASIS] CAMP TC, 2014) 
Document containing references to real resource types, resource 
locations, and deployment mechanisms, which are fully understood 
and handled by a CAMP-compliant implementation. 

e Service Catalogue: it is a persistent collection of versioned services, 
each of which includes service information, deployment informa- 
tion, and CL-Resource specification. 


APPLICATION BLUEPRINTS AND SERVICE DESCRIPTION 99 


e Service Decomposition Engine (SDE): handles the transformation of 
Abstract Blueprints to concrete Blueprints according to provided 
requirements. 

e Brooklyn: used for deploying and managing the applications via 
Blueprints. 


44.2.2 Resource Lifecycle Management 


e CL-SOSM Layer: CloudLightning SOSM Layer aims to identify and 
create /allocate the optimal CL-Resource for applications using prin- 
ciples of SOSM. 

e CL-RA Layer: CloudLightning Resource Abstraction Layer is used 
for abstracting the CL-Resources in different ways (such as Bare 
Metal, Virtualisation, Containerisation, and Direct Access) from 
various hardware types (such as Central Processing Unit [CPU], 
GPU, Data Flow Engine, and Many Integrated Core [MIC]). 

e Heat Orchestration Template (HOT): describes the infrastructure 

resource (such as servers, networks, routers, floating IPs, and volume) 

for a cloud application, as well as the relationships between resources. 

Heat Interface: automatically generates HOTs in terms of the results 

from SOSM Layer or dynamically modifies HOTs based on the 

results from the Continued Improvement component. 

e Heat Engine: manages the whole lifecycle of the provisioning 

process. 

Continued Improvement: this management component together 

with Heat and telemetry does the continued improvement for the 

deployed blueprint during the lifetime. 


4.5 THE CLOUDLIGHTNING GATEWAY ARCHITECTURE 


Integration of the use cases provided in CloudLightning with the Gateway 
Service will be done by following the CL-SDL (Xiong et al. 2016). The 
proposed CL-SDL specification is built on top of the OASIS CAMP speci- 
fication and introduces new concepts suitable for expressing the require- 
ments of HPC applications. 

The syntax of the CL-SDL is based on the Brooklyn blueprint YAML 
(Yet Another Markup Language) and is used to describe the Resource 
Template and the Resourced Blueprint. Both of these offer support for 
CloudLightning Blueprint lifecycle management. 
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The Blueprint is used to represent specific application requirements, 
constraints, and metrics defined by either the EAD or the EAO, and 
describe services by name and their relationships. As depicted in Fig. 4.5, 
service definitions are predefined by EADs in special catalogues that fol- 
low the Cloud Service Archive (CSAR) specifications (Breiter et al. 2012), 
a subset of rules defined by the Topology and Orchestration Specification 
for Cloud Applications (TOSCA) standard (OASIS Open 2013). 

The Resourced Blueprint is obtained from the SDE. This operation 
effectively invokes the underlying CL-SOSM subsystem that is responsible 
for resource management, for available resources and resource definitions. 
The resulting Resourced Blueprint is completely supported by a CAMP- 
compliant CAMP Provider (Carlson et al. 2012).! 

In the CL-Blueprint all references to CloudLightning-defined artefacts 
are removed, except for specific CloudLightning handles (opaque to the 
CAMP Provider). These handles are used for the creation of a session 
between the resource scheduling (self-organisation) layer and the deployed 
resources. This CL-Blueprint represents a fully qualified CAMP Document 
containing reference to real resource types, resource locations, and deploy- 
ment mechanisms, which are fully understood and handled by a CAMP- 
compliant implementation. 


4.5.1 Gateway Service Architecture 


The CloudLightning Gateway Service builds upon the capabilities of the 
Apache Brooklyn solution, providing “service decomposition” capabilities. 
The Gateway Service completely reuses the rest of the features provided 
by Apache Brooklyn, facilitating the reuse of existing Blueprints and inte- 
gration. Of particular interest is the integration with various Configuration 
Management Systems like Puppet, Chef, or Ansible (Fig. 4.6). 

The Gateway Service has several roles, as follows: 


1. Receive/create abstract? Blueprint definitions from EAO. 

2. Decompose the received Abstract Blueprint into individual services. 
For each of the services check if it is a fully qualified service or has to 
be further processed. This operation is further discussed in Sect. 
4.5.2 (Service Decomposition). 

3. Once the Blueprint is fully qualified (it does not contain any abstract 
service definitions), the Gateway Service triggers the services deploy- 
ment and further execution. 
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Fig. 4.6 Gateway Service overall architecture 


The Gateway Service exposes a series of APIs usable by consumers 
(EAOs and EADs) for controlling the application lifecycle. 


4.5.2 Service Decomposition 


The operation of Service Decomposition is implemented by the SDE and 
represents one of the core CloudLightning contributions in the Gateway 
Service. The SDE is responsible for the interaction with the SOSM subsys- 
tem. The overall operation of the SDE can be summarised as follows: 


1. For each service, check if it can be instantiated directly (there exists 
a single implementation of the service, and that implementation is 
well known to the Gateway Service) or that it is an abstract service 
(a service interface that could be implemented by several 
implementations). 

2. If the service is an abstract service the SDE contacts the backend 
SOSM system for selecting the proper implementations for the 
service. 
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3. In order to facilitate the selection of the proper implementation, the 
SDE transmits the user-provided requirements (in the form of 
ClassAd [Solomon 2003] definitions). These requirements are used 
by the SOSM subsystem for properly selecting the right 
implementations. 

4. The selection of concrete implementations results in modifying the 
original Blueprint, by replacing the abstract definition with the 
resourced one (eventually after a user interaction for validating the 
right solution) and submitting the Blueprint to the next stage. 


4.5.3 Interaction with the SOSM System 


After the successful query of available implementations for each abstract 
service definition, the SDE component constructs a Resource Template 
containing information about the specific requirements of each implemen- 
tation. An example of such Resource Template is given in Listing 4.1 

Consider a Blueprint containing a single service in order to maintain 
better readability of the listing. Such a document contains a blueprint ID 
that is unique for each request, a timestamp representing the request time, 
a cost limit for the entire Blueprint, and the callback endpoint used by the 
SOSM system to communicate back results of the optimisation steps. 

The sample service has two implementation options between which the 
SOSM will choose depending on their constraints and the overall cost of 
the blueprint. The first one refers to the need for a single VM with a single 
core (expressed by a computation range between 1 and 1), 1000 MB of 
memory, 50 GB of storage, bandwidth between 100 Mbps and 1 Gbps, 
and no accelerators. 

The second implementation is of type MIC-CONTAINER, requiring 
the CellManager to find or create a container, which has access to an MIC 
accelerator. This service requires one container with one CPU core, mem- 
ory between 100 and 1000 MB, storage between 10 and 50 GB, the same 
bandwidth as the other implementation, and one MIC accelerator. 


4.5.3.1 Resource Discovery 
The Gateway Service and the SOSM system exchange information for two 
operations: resource discovery and resource release. 


e Resource discovery is the operation by which the SOSM system 
chooses the most suitable service implementation and the resources 
on which to deploy it, according to user constraints and system state. 
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“blueprintld”™: "{bpId}", 
2 ||" timestamp”: 1929292, 
"cost": 0.0, 
4||"callbackEndpoint”": "http://10.0.0.1/sde/rest/blueprints/(bpld)”. 
“serviceElements”: | 
6 
{ 


"serviceElementId”: “service—elem—1", 


8 “implementations”: [ 
{ 

10 “implementationType”: “CPUNM”, 
"requiredResourceUnit”: 1, 

12 ”computationRange”: [1. 1]. 
“memoryRange”: (1000, 1000], 

la “storageRange”: (50, 50]. 
“bandwidthRange”: (100, 1000], 

6 “acceleratorRange”: [0, 0] 
}, 

1 
{ 

20 “implementationType”: "MIC-CONTAINER” , 
”requiredResourceUnit”: 1, 

2 “computationRange”: [1, 1]. 
“memoryRange”: (100, 1000), 

uM “storageRange”: [10, 50], 
“bandwidthRange”: [100, 1000], 

16 "“acceleratorRange”: [1. 1) 
} 


Listing 4.1 Resource template 


e Resource release is the operation by which the SOSM system is 
informed that the services have been terminated, so the underlying 
resources may be reallocated. 


The aforementioned operations are modelled by Hypertext Transfer 
Protocol (HTTP) Representational State Transfer (REST) methods, both 
the Cell Manager and the SDE acting as REST servers. 

Figure 4.7 describes the protocol for resource discovery and a POST 
request with the body containing a ResourceTemplate of the structure, as 
illustrated in Listing 4.1. If the Cell Manager encounters any problems 
during the parsing of the body, the status code of the response will be 409 
Conflict. Otherwise, the status code will be 201 Created and the resource 
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' 1: POST /cell- E Es 


i manager blueprint ) 


) Analyze request format 


ee STATUS 201 CREATED: 
ISTATUS 409 CONFLICT 


'— 2: POST /sde/blueprints/results 


pr STATUS 201 CREATED: -——————— > 


Li 


Fig. 4.7 Resource discovery sequence diagram 


discovery process will start. The Cell Manager is in charge of informing 
the SDE when the result is ready. 

When resources have been identified for all services, the Cell Manager 
will use a POST request with the body containing the information about 
the placement and implementation of each service, referred as a Resourced 
Template. This will trigger the SDE to instantiate each abstract service and 
update the Blueprint with concrete services and resource access informa- 
tion. An example result is shown in Listing 4.2. The chosen implementa- 
tion is CPU-VM, and the resource type is OPENSTACK ACCOUNT, 
meaning that the SOSM is managing an OpenStack cluster as a resource. 
In this case, access information consists of credentials for accessing the 
OpenStack Nova API in order to create the VM. 


4.5.3.2 Resource Release 

The protocol for releasing the resources associated to a Blueprint is 
depicted in Fig. 4.8. A DELETE request is made to the Cell Manager at 
a path referencing the Blueprint ID. In case of successful resource release, 
the response will have the status 204 No Content. Otherwise, the 
response will have status 400 Bad Request and the body should provide 
useful information that will be propagated to the user interface (UI). 
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{ 

2||"blueprintId”: "{bpId}”", 
“timestamp”: 1929392, 

4||" status”: "SUCCESSFUL” , 


"resourcedServiceElements”: [ 


6 
; YserviceElementId": "service—elem-1", 

5 "implementationType”: "CPUYM”, 

"creatorld”: "vrm-1”, 
10 ” status”: “COMPLETED” , 

"resourceType”: "OPENSTACK ACCOUNT” , 
12 ” resources”: [ 
14 § esourceOreationld”: ” 1234—5567 —82929” , 

"resourceDescriptor”: "{\"platform\”: \"OPENSTACK\” , 
16 \"domain\”: \"SOSM\", \" project \": \"CL-SOSM\”, 

V username \”: \"cl-admin\”, \" password\”: \"s3cret\”, 

18 V authEndpointX”: \"http://10.0.2.19:5000/v3.0\" }” 


} 


EE [ ceana | 


: DELETE /cell- 


FE Resource decommission 
| AS | 5 Ak LJ 


Fig. 4.8 Resource release sequence diagram 


4.6 ‘THE CLOUDLIGHTNING BLUEPRINT EXTENSIONS 


Below is a summary of the technologies upon which the CloudLightning 
Blueprints were developed. 


4.6.1  CloudLightning Brooklyn Extensions 


As part of CloudLightning project, Apache Brooklyn was adopted and 
extended as the underlying platform for achieving the project’s ultimate 
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goal of both supporting HPC applications and adoption of modern cloud 
technologies, thus creating a bridge between the HPC and Cloud end 
user communities. 

The decision to use the Apache Brooklyn framework is motivated by 
the design decisions established in the conceptualisation of the 
CloudLightning architecture (Morrison et al. 2016), the CloudLightning 
protocol specification and APIs (Neagul et al. 2016), and the Gateway 
Service (Dragan et al. 2017). 

The main advantages of using Apache Brooklyn include: 


1. It provides the building blocks needed for developing the necessary 
functionality expected from the Gateway Service. 

2. It offers support for “automatic blueprints” based on OASIS CAMP, 
an extensible specification that can serve as the core specification for 
the CloudLightning Blueprints. 

3. The Apache Project plans to support TOSCA in the near future.* 
This could potentially allow further developments in the 
CloudLightning SDL, supporting the TOSCA standard (OASIS 
Open 2013). 

4. The harnessing of existing Apache Blueprints, providing HPC ven- 
dors more choices without requiring more development effort. 


The purpose of this section is to discuss how the adoption of the 
Brooklyn Blueprints, particularly the expected additions to the Blueprint 
YAML, is envisioned in CloudLightning. As previously noted, two differ- 
ent kinds of blueprints are identified for use in CloudLightning: Abstract 
Blueprints and Concrete Blueprints (referred further as “blueprints”). 
Both types of Blueprints are built on top of Apache Brooklyn blueprints. 

The translation between the Abstract Blueprint and Runnable Blueprints 
is performed by means of a specialised component residing inside the 
Gateway Service, component named “Service Decomposition Engine.” 
The decomposition engine is responsible for interacting with the SOSM 
infrastructure (Fig. 4.9). 

Each of the two types of Blueprints is discussed in the following sec- 
tions, outlining the changes to the vanilla (plain) Brooklyn Blueprints. 
Note that the proposed extensions are subject to change as other parts of 
the CloudLightning Project evolve and might also be influenced by out- 
side changes in the Apache Brooklyn project, as, for example, the addition 
of new functionality or deprecation of a current one. 
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Abstract Blueprint |«———————————|Gateway Service Console (UI) 


Gateway Service 


A 
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V [ Btueprint | — Brooklyn 


EC _  —_ a — F P r 
Self-Organization and Self Management (SOSM) Layer 


Fig.4.9 CloudLightning Blueprint decomposition process 


4.6.2 CloudLightning Abstract Blueprint 


The Abstract Blueprint is represented by an extended version of the 
Apache Brooklyn Blueprint, containing attributes holding CloudLightning- 
specific entries, as described in Listing 4.3. 

In this example, the Abstract Blueprint requires the deployment of a 
Java web application and a computing resource providing raytracing 
capabilities. Of interest in this case is the abstract computing service 
identified by the name “RaytracingApplicationId”: the service cannot be 
directly handled by the Apache Brooklyn framework as it does not 
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id: jetty —node—with—raytracing -compute 
2 ||¡name: “Jetty Application With,Raytracing,,Computing,,Resource" 
origin: http://cloudlightning.io/ 
locations: 
— cloudlightning —openstack 
s|l|services: 
— type: cloudlightning.entity . meta. RaytracingApp 
s ||name: RaytracingApplicationId 
location: cloudlightning —openstack 
w||cloudlightning. config: 
service —requirements: 
1 ||- type: classad 
requirement: *'Arch=="Intel",¡kk,,CoProcesor=="IntelPhi"” 
1 rank: *TARGET.Mips” 
— type: brooklyn.entity . webapp. ControlledDynamicWebAppCluster 
iw |name: webApp 
location: cloudlightning —openstack 
i |/cloudlightning. config: 
service —requirements: 
20 ||— type: classad 
requirement: ’Arch=="Intel"’ 
2» || brooklyn. config: 
wars.root: http://example. cloudlightning .io/webapp/ webapp. war 
a ||http. port: 9280+ 
proxy.http.port: 9210+ 
æ || java.sysprops: 
cloudlightning.example.ray. url: 
28 Sbrooklyn:formatString("drmaa://%s", 
component ("RaytracingApplicationld"). 
w attributeWhenReady ("drmaa.url")) 


Listing 4.3 An Abstract Blueprint 


provide the required information (the cloudlightning.entity.meta. 
RaytracingApp type is not known to Brooklyn). 

This service is handled by the CloudLightning SDE by interpreting the 
provided application information (in this case, the type) and the corre- 
sponding matching information. The information needed for the normal 
SDE operation is defined at the service level, under the cloudlightning. 
config attribute. 

The relevant attributes handled by the SDE at the “service- 
requirements” level are: 


e Type: this field defines the syntax used for expressing this require- 
ment. Currently the only defined syntax is based on the ClassAds 
system*. 
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e Requirements: this field defines the expression interpreted by the SOSM 
system to identify the appropriate resource required for this service. 

e Rank: this field defines the way of ranking the possible solutions 
obtained from the underlying SOSM infrastructure; this expression 
might be used to prefer resources by various attributes, eventually 
based on power consumption or computing power. 


The “requirements” attribute is aimed at restricting the resources that 
the SOSM subsystem can consider for choosing the proper implementa- 
tion for the user-requested service. This attribute is expected to be used by 
HPC application to express their performance requirements, and it is 
complemented by the “rank” attribute, used for expressing preference 
regarding the available and matching resources. 


4.6.3  CloudLightning Blueprint 


The CloudLightning Blueprint represents the outcome of the Service 
Decomposition Operation and basically represents a fully qualified 
Blueprint document that can be handled by the CAMP framework (in our 
case, Brooklyn). 

As seen in Listing 4.4, all “abstract” specifications have been replaced 
with concrete ones. For example, the cloudlightning.entity.meta. 
RaytracingApp type has been replaced with another type understood by 
Brooklyn (cloudlightning.entity.impl.HPCCluster). This new type is 
complemented by a new set of attributes that provide deployment-specific 
information. 

It is important to note that the “location” attribute has been custom- 
ised to provide CloudLightning-specific information; particularly in this 
case, it contains a handle provided by the underlying SOSM subsystem 
that can be used at deployment time for synchronising information 
between the various subsystems. Notice that the cloudlightning. entity. 
impl.HPCCluster is known to Brooklyn due to the fact that it is regis- 
tered by the EAO in the corresponding catalogue. 


4.7 | EXAMPLE OF APPLICATION CREATION 
AND DEPLOYMENT 


The architecture of the CloudLightning Gateway Service was presented 
previously in Sect. 4.5. This section demonstrates, using an example of a 
raytracing application, the ease with which the application topology can be 
created and deployed using the CloudLightning Gateway Service. This 
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id: jetty —node—with—raytracing —compute 

¿|[[¡name: “Web Application, With, Raytracing,,Computing, Resource” 
origin: http://cloudlightning.io/ 

locations 
— cloudlightning —openstack 


«||services 
- type: cloudlightning.entity .impl. HPCCluster 
s ||name: RaytracingApplicationld 
location 
w ||cloudlightning —openstack 
session:handle: "b4cfc054 -b760 -4482-a2ce -96 a65b3d72d0" 


n || brooklyn. config 
cloudlightning. deployment 

u || puppet. manifests.location: "http://p.cloudlightning.io/m/intelphicluster”" 
— type: brooklyn.entity .webapp. ControlledDynamicWebAppCluster 

w |name: webApp 


location 
||} cloudlightning —openstack 

session:handle: "26670286 -a0ad -499e-9fef -£665d156e27e" 
» || brooklyn. config 

wars.root: http://example.cloudlightning .io/ 
22 webapp /webapp. war 


http. port: 9280+ 
z || proxy . http . port: 92104 
java.sysprops 
zw ||eloudlightning.example.ray. url: 
Sbrooklyn:formatString("drmaa://%s", 
an component ("RaytracingApplicationld"). 
attribute WhenReady("drmaa.url”)) 


Listing 4.4 The CloudLightning Blueprint 


use case is used to illustrate a user’s interactions with the Gateway Service, 
enhancing the resource optimisation feature. The remainder of this sec- 
tion provides a brief overview of the steps to be taken to safely create, 
optimise, and deploy the raytracing application on the CloudLightning 
environment. Some of the essential steps are also depicted in screenshots 
taken from the actual system. 

The process is as follows: 


Step 1: To initialise the system, start Alien4Cloud service. 

Step 2: Add the plugin to the desired orchestrator (CloudLightning 
uses Brooklyn-TOSCA as the underlying orchestrator). After 
the plugin is loaded, Alien4Cloud will present the orchestrator 
in the list of available plugins. 


Step 3: Create a new orchestrator from the UI and link it to the newly 
added plugin. 
Step 5: Before one can connect the orchestrator instance from 


Alien4Cloud to the underlying orchestrator (basically, the 
SOSM subsystem), one has to ensure that the Gateway Service 
Orchestrator is running. This step is not a mandatory step to 
be taken but it is advised. 


112 I. DRAGAN FT AL. 


Step 6: 


Step 7: 


Step 8: 


Step 9: 


Step 10: 


Step 11: 


Step 12: 


Step 13: 


Step 14: 


Step 15: 


From the web console one can connect to the bespoke orches- 
trator. Before any further steps can be taken, wait until the 
orchestrator state is CONNECTED. 
After the orchestrator is connected, download the CSAR 
archive from a remote git repository. 

The orchestrator comes with git integration functionalities, 
and the only requirement is to have stored all custom CSAR 
files in such a repository. In case of the raytracing example, one 
has to enter the predefined git credentials and URL. The 
download process of the CSAR archive starts only after one 
clicks the Import button. 

Add the CloudLightning plugin to have access to the 
CloudLightning functionalities. 

For the creation of new applications one has to use the func- 
tionalities exposed by Alien4Cloud, more precisely the New 
Application panel. The CSAR archive may contain already 
defined application templates, and one can select some of those 
for the intended application design. 

As soon as the application creation step is finished, one can 
view the design and application in its home panel. 

The previously defined topology contains four types of nodes, 
which can be viewed in the Topology tab (see Fig. 4.10). It is 
also possible to view the newly created topology in YAML for- 
mat by pressing the YAML tab in the designer. 

Next, enter the CloudLightning Optimisation Panel and start 
the optimisation process from the SOSM Optimiser button 
(see Fig. 4.11). On the left-hand side, one can view the end- 
point for the SDE. 

Check that the SDE is up and running, and when the optimisa- 
tion process is finished, one can notice that the abstract nodes 
have been replaced with concrete ones also in the application 
designer. 

As a final step prepare for the deployment of application by 
entering into the Deployment Panel. The orchestrator has 
already sent information about locations to Alien4Cloud and 
one has only to select the desired location. 

By moving to Deploy tab one can trigger the actual deploy- 
ment of the application. This step is performed by pressing the 
Deploy button and wait until it finishes. Once pressed one can 
follow the explicit progress of the deployment also in the 
orchestrator console. 
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4.8 | CONCLUSION 


This chapter presented the CloudLightning Gateway Service, a user- 
friendly interface that enables users to create and deploy applications with 
minimum knowledge regarding the resource selection process. The 
Gateway Service is a key component of the CloudLightning system that 
facilitates application lifecycle management in the context of a cloud envi- 
ronment. Users can design the application topology using the Drag & 
Drop mechanism of the Gateway Service UI and link together the compo- 
nents of their application. From here, the topology is sent to the SDE, 
which is responsible for interacting with the SOSM system. The SDE 
translates the information from the application topology, into a specific 
CloudLightning Blueprint, using the CloudLightning Service Description 
Language. Next, SOSM handles the resource discovery process, assigning 
the most suitable set of resources for a user application, based on the 
received CloudLightning blueprint. In the following step, the SOSM sends 
back to the SDE a CloudLightning blueprint, with a proposed resource 
for each component of the application topology. In the end, the user may 
review the final version of its application topology, with the assigned 
resources, and start the process of application deployment. 
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2. Selea, T., Dragan, I, & Fortis, T. F. (2017, April). The 
CloudLightning approach to cloud-user interaction. In Proceedings 
of the Ist International Workshop on Next generation of Cloud 
Architectures, Vol. 4, ACM. 


NOTES 


1. The term CAMP provider is used in the sense as defined by the CAMP 
specification, basically “an implementation of the service aspects of this 
specification.” 

2. Abstract Blueprints are those blueprints that will be later on filled with con- 
crete resources by the CL-System. 

3. https: //brooklyn.apache.org/learnmore/theory.html 

. https: //research.cs.wisc.edu/htcondor/classad /classad.html 

. One keeps definitions of services in CSAR format in a remote repository. 
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reviewed. Furthermore, a recently proposed class of power models for 
heterogeneous CPU-Accelerator-based hardware is discussed. Finally, 
large-scale simulations for traditional and Self-Organised and Self 
Managed cloud environments are presented and compared. 


Keywords CloudLightning simulator + Self-organisation © Self 
management e° Scalability ° Large-scale simulations 


5.1 INTRODUCTION 


Cloud simulation tools have been extensively used for the analysis of cloud 
data centres, since the cost of experimentation using various scenarios is 
low. A number of different aspects, related to cloud environments, can be 
studied through simulation including resource allocation strategies, live 
migration of running applications to more efficient data centre resources, 
energy consumption, and hardware resource utilisation. Several cloud 
simulation tools have been developed during the past few years focusing 
on different aspects of cloud environments. These tools can be categorised 
into: 


° Discrete Event Simulators (DES): These examine macro-scale phe- 
nomena, such as application events that take place in certain moments 
in time while completely disregarding micro-scale phenomena, 
including network packet communication. DES are used to examine 
large-scale simulations, while focus is given among others in the 
study of cloud environments behaviour in terms of service delivery, 
Virtual Machine (VM) allocation policies, utilisation of resources, 
and the energy consumption of data centres. 

e Packet-Level Simulators (PLS): These examine micro-scale phenom- 
ena related to cloud environments, including packet loss and net- 
work communication protocols. PLS offer high levels of accuracy at 
the cost of performance though, since large-scale data centres cannot 
be studied due to the restricting resolution of the simulations. 


Cloud infrastructures continue to grow in both size and diversity to 
cater for demand in terms of both user and data volumes and the variety 
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of hardware resources. As a result, existing cloud simulation tools cannot 
be used to efficiently simulate these heterogeneous environments at scales 
several orders of magnitude greater than traditional data centres. By 2020, 
hyperscale data centres will account for a substantial portion of all cloud 
workloads and data (Cisco 2016). Furthermore, as hyperscale data centres 
consist of servers in distinct geographical locations, the efficient manage- 
ment of such infrastructures is made more difficult resulting in network 
congestion and underutilisation of resources. Resource heterogeneity fur- 
ther exacerbates these challenges. While hyperscale data centre operators 
increasingly offer specialised hardware, such as Graphical Processing Units 
(GPUs), Many Integrated Cores (MICs), and Field-Programmable Gate 
Arrays (FPGAs), existing cloud simulation tools do not support them. The 
efficient exploitation of the hardware infrastructure of heterogeneous 
hyperscale cloud environments is a topic of great importance during the 
last few years; thus, cloud simulation tools for studying heterogeneous 
cloud environments that can cater for hyperscale need to be developed. 

The remainder of this chapter is organised as follows. Section 5.2 pro- 
vides a summary review of common cloud simulation frameworks used by 
the scientific community and their limitations. A new simulation frame- 
work, the CloudLightning Simulator, designed to simulate hyperscale 
cloud environments composed of heterogeneous resources is presented in 
Sect. 5.3. This is followed by a discussion of initial experimentation using 
the CloudLightning Simulator to compare service delivery of three appli- 
cation scenarios: oil and gas exploration, ray tracing, and genomics, using 
(i) conventional cloud service delivery and (ii) cloud service delivery using 
a self-organising self-managing (SOSM) approach. 


5.2 CLOUD SIMULATION FRAMEWORKS 


During the last decade, various cloud simulation frameworks have been 
proposed, such as CloudSim (Calheiros et al. 2011), DCSim (Tighe et al. 
2012), GDCSim (Gupta et al. 2011), GreenCloud (Kliazovich et al. 
2012), iCanCloud (Nunez et al. 2012), and CloudSched (Tian et al. 
2015). However, no existing cloud simulation framework is designed for 
hyperscale simulations. 

One of the main limitations of existing cloud simulation tools is the 
lack of scalability. Most existing cloud simulation tools do not support 
parallelism; thus, the simulation of very large data centres is not possible 
(Byrne et al. 2017). Parallelism is of great importance for the simulation 
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of hyperscale cloud environments since both computational work and 
memory requirements can be distributed among multiple nodes, reducing 
the execution time significantly and enabling the simulation of large-scale 
data centres. 

An important factor influencing scalability of the extant simulation 
tools is memory requirements. In DES a large number of events should be 
created and retained. The number of these events is closely related to the 
number of resources simulated as well as the input tasks. Discrete Event 
based simulators initialise the task list that will be executed for the whole 
simulation and augment it gradually with new events according to time. 
This process requires retaining a very large list in memory, its augmenta- 
tion with new events, and its sorting in order to perform events in the 
correct order. Thus, memory requirements increase significantly with the 
number of resources or the simulation time. Memory restrictions also 
occur due to the high level of detail of the simulated components, such as 
in the case of the iCanCloud and GreenCloud frameworks, which becomes 
prohibiting in very large-scale executions. 

The effective management of resources is a significant challenge as their 
number increases. More specifically, strategies which require the detection 
of specific hardware cannot be applied or require significant computa- 
tional cost when hyperscale systems are considered. Also, status informa- 
tion corresponding to the underlying hardware resources is becoming 
outdated, and thus efficient management of the system becomes more 
challenging. Specialised strategies are required in hyperscale cloud envi- 
ronments for the efficient and up-to-date management of the system. 
Such strategies are not supported in existing simulation frameworks, and 
thus the simulation of hyperscale systems is difficult to perform. 

Finally, the inclusion of heterogeneous resources is not supported by 
existing cloud simulation tools. Simple generic models are required for the 
simulation of heterogeneous resources in order to be integrated in cloud 
simulation environments (Makaratzis et al. 2017; Giannoutakis et al. 
2017). 


5.3 CLOUDLIGHTNING SIMULATOR 


Unlike existing frameworks, the CloudLightning Simulator has been 
designed from the ground up as a massively scalable solution, able to sim- 
ulate hyperscale data centres consisting of millions of cloud nodes/servers. 
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The framework is written in C++ and is parallelised using Message Passing 
Interface (MPI) (Gropp et al. 1996) and OpenMP (Dagum and Menon 
1998) to enable the efficient handling of hyperscale simulations. 
CloudLightning supports the simulation of heterogeneous infrastructures 
(including GPUs, MICs, and FPGAs/DFEs) that are commonly used for 
the acceleration of High Performance Computing applications. One 
important characteristic of the developed framework is the use of a time- 
advancing loop, a technique that removes the need for pre-computation 
and storage of future events, resulting in a significant reduction of its 
memory requirements. This allows the integration of dynamic resource 
allocation policies, such as SOSM, enabling the efficient management of 
computer resources for simulating hyperscale environments. Moreover, 
the CloudLightning Simulator places an emphasis on the simplicity of the 
models it uses, focusing on models that require reduced number of com- 
putations for producing the results of the simulations without loss of accu- 
racy. Finally, all inputs and outputs of the simulator are represented 
graphically. 

The remainder of this section presents the generalised and extensible 
CloudLightning simulation framework for simulating heterogeneous 
resources using an SOSM approach. 


5.3.1 Architecture and Basic Characteristics of the Parallel 
CloudLightning Simulation Framework 


The CloudLightning Simulator was designed to simulate clouds relying 
on the Warehouse Scale Computer (WSC) architecture (Barroso et al. 
2013). WSC has been adopted by a multitude of companies including 
Google, Amazon, Yahoo, Microsoft, and Apple, and has been widely used 
in the design of cloud environments (Mars 2012). In the WSC architec- 
ture, interconnected cloud computing nodes are grouped into cells that 
are centrally managed (Fig. 5.1). 

In this architecture, the Gateway service is responsible for redirecting 
end user requests to the appropriate Cells. The Gateway service is the 
entry point of the system and is a cloud entity that receives resource 
requests from the end users and redirects them to the Cells. A conceptual 
cloud architecture with multiple Cells is presented in Fig. 5.2. The 
resources are organised and monitored by the Cell manager’s broker that 
is responsible for the provision of appropriate resources to end user 
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resources resources resources 


Fig. 5.1 Warehouse Scale Computer abstract architecture 


requests and for the deployment of incoming tasks to the available 
resources. The broker component is composed of multiple services, 
including orchestration, telemetry, and identity service. Hyperscale cloud 
environments consist of a considerably large number of Cells. 

In the CloudLightning simulation framework, each Cell is hosted on a 
different computing node of a distributed system, while the Gateway ser- 
vice is hosted on the master computing node. The communication 
between the Gateway service and the Cells is performed using the MPI 
framework. The following operations are performed by each Cell (Filelis- 
Papadopoulos et al. 2017, b): 


e Receiving simulation parameters 

e Initialisation of different components, including hardware resources, 
the broker, network, telemetry, and the SOSM engine 

e Receiving the task queue in each time-step 

e Searching for available resources for the execution of the tasks, using 
the SOSM engine 

e Updating the state of the resources and controlling the execution of 
the tasks 

e Communicating status information to the Gateway Service 


125 


SIMULATING HETEROGENEOUS CLOUDS AT SCALE 


99/NOS9Y 


Áujauaja L 


99/NOSOY 


99/NOSOY 


sje? ə|dn[nuu WIA 91999314918 pnop wensqy T'S “SH 


a89/n0Say ƏoS1nosəkrj Əoinosətr 


9911199 ÁBMaJeO 


126 C.K. FILELIS-PAPADOPOULOS ET AL. 


The operations performed by the Gateway service are the following 
(Filelis-Papadopoulos et al. 2017, b): 


e Retaining simulation inputs and communicating data to the Cells for 
the initialisation of the simulation components 

e Creation of the task queue in each time moment, fragmentation of 
the task queue into subqueues, and communication of the subqueues 
to the Cells, by maintaining load balance through all Cells 

e Receiving status information from the Cells 

e Processing and storing historical statistics and metrics 


The parallelisation of the CloudLightning Simulator in distributed sys- 
tems is of great importance, since simulating hyperscale infrastructures is 
a computationally and memory-intensive process. For this reason, various 
components of the CloudLightning Simulator use the OpenMP frame- 
work in different ways to accelerate their computations on shared memory 
multiprocessors. The Gateway Service processes statistics in parallel—the 
Cells perform resource discovery and task deployment as well as the update 
of the resources’ state on different multiprocessor cores. The SOSM tech- 
niques are also performed in parallel. 

Figure 5.3 presents the software architecture of the CloudLightning 
Simulator (Filelis- Papadopoulos et al. 2017): 


5.3.2 SOSM Engine 


One of the most important characteristics of the CloudLightning Simulator 
is the use of SOSM techniques to control the underlying resources of the 
Cells in a more efficient manner (Filelis-Papadopoulos et al. 2017). 

In traditional cloud architectures, the resources are managed by the 
broker, a central entity that is responsible for the search and deployment 
of the available resources with respect to incoming task requests, the col- 
lection of data for the state of all underlying resources, and the manage- 
ment of all underlying resources of the data centre. This centralised 
approach has limitations due to the computational complexity involved 
in locating specific hardware, especially when the number of resources 
increases. Locating the most appropriate server for the execution of a task 
is a computationally expensive operation in large data centres, and it is 
generally avoided in favour of strategies such as the “first-fit approach,” 
where a task is deployed on the first available server or coalition of servers. 
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Fig. 5.3 Software architecture of the parallel CloudLightning simulation 
framework 


This type of strategy is not effective though in terms of both computa- 
tional and energy efficiency, resulting largely in the underutilisation of the 
available resources (Filelis-Papadopoulos et al. 2017). More effective 
strategies, such as SOSM, need to be applied to achieve high levels of 
resource utilisation and thus computational and energy efficiency. 

In the CloudLightning architecture, each Cell is organised in a hierar- 
chical tree structure. As discussed earlier, the tree contains different enti- 
ties, including prescription Routers (pRouters), prescription Switches 
(pSwitches), and virtual Rack Managers (vRMs). Figure 5.4 presents an 
example of the CloudLightning tree structure. In this structure, the 
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pSwitch 


resources resources resources resources resources 


Fig. 5.4 Hierarchical structure of the SOSM engine 


resources are locally managed by the vRMs which in turn are locally man- 
aged by the pSwitches, while the pSwitches are locally managed by the 
pRouters. The local management of the architectural components allows 
the efficient collection and analysis of data that can lead to an improved 
decision-making process. Each component can describe the state of its 
underlying resources since metrics describing the state of the resources are 
collected with respect to an interval and averaged by each component to 
form its own state. Also, weights describing the desired state of the system 
are communicated from the Gateway Service to the underlying compo- 
nents. By using these metrics and weights, each component's Suitability 
Index is computed. The Suitability Index expresses how appropriate is a 
component to receive an incoming task. By using the Suitability Index, 
each incoming task can be subsequently directed to the most efficient 
resources. 

The exchange of metrics and weights between the components is part 
of the Self- Management actions and is performed by all the components 
of the SOSM engine. The Self- Organisation techniques, on the other 
hand, are solely performed by the vRMs and the pSwitches. In the case of 
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vRMs, there can be an exchange of resources between vRMs that are 
hosted by the same pSwitch, in order to maximise the efficiency of the 
system and to host tasks that require more resources than available on a 
vRM. New vRMs can also be created, while vRMs that do not contain any 
resources to manage can be destroyed. Similarly, pSwitches that are hosted 
by the same pRouter can exchange vRMs; new pSwitches can be created, 
while existing pSwitches can be dismissed when they have no vRMs to 
manage. 

Each pRouter of a Cell is homogeneous, as it contains resources of the 
same type. In order to maintain the homogeneity, Self-Organising actions 
are not performed at the pRouter level; thus, pSwitches cannot be 
exchanged between pRouters. For this reason, pRouters are the entry 
point for the selection of a specific type of resource inside a Cell (Filelis- 
Papadopoulos et al. 2017). 

The SOSM system improves significantly the scalability of cloud envi- 
ronments since the most appropriate hardware for the execution of a task 
can be located fast and with low computational cost, even in data centres 
with a very large number of resources. In the CloudLightning Simulator, 
the SOSM engine is implemented in parallel using the OpenMP 
framework. 


5.3.2.1 Power Consumption Modelling 

To estimate the power consumption of large-scale heterogeneous data 
centres, a number of different power models for both Central Processing; 
Unit (CPU) servers and combined CPU-accelerator pairs were developed. 
The power models are generic with low computational cost (Filelis- 
Papadopoulos et al. 2017; Giannoutakis et al. 2017). For this reason, the 
CloudLightning Simulator is capable of computing the power consump- 
tion of very large heterogeneous data centres without a significant impact 
on its scalability. The following subsection gives a detailed presentation of 
the integrated power consumption models. 


CPU Power Models 
Piecewise interpolation methods between recorded CPU power consump- 
tion levels, and generic models that estimate the trend of the power- 
utilisation diagram of CPUs by using the idle and maximum power 
consumption of the CPU servers, have been integrated. 

The interpolation methods are performed between recorded CPU 
power consumption levels that are available mainly as part of the 
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Standard Performance Evaluation Corporation (SPEC) benchmark 
(SPEC 2008). Existing simulators, such as CloudSim, use linear inter- 
polation between power measurements on rounded utilisation intervals 
(i.e. 0%, 10%, 20%, etc.) (Beloglazov and Buyya 2012). In order to 
achieve improved accuracy, the interpolation methods in the 
CloudLightning Simulator are applied on the exact utilisation intervals 
of the power measurements (i.e. 0%, 10.2%, 19.7%, etc.) as the error of 
the rounded interpolation intervals increases when simulating very 
large data centres (Giannoutakis et al. 2017). Two different interpola- 
tion methods were used, the linear and the “not-a-knot” cubic spline 
interpolation. 

Generic models were also integrated, since they require less computa- 
tional cost and power measurements compared to the interpolation meth- 
ods. The models estimate the power consumption of CPU servers by 
using the utilisation of the CPU server and its power consumption in idle 
and max states. The linear, square, cubic, and square root models that 
have been used in existing cloud simulators (i.e. CloudSim) were inte- 
grated (Beloglazov and Buyya 2012). For the CloudLightning Simulator, 
a generic CPU power model was used based on a third-degree polyno- 
mial, which estimates more accurately the trend of the power-utilisation 
diagram of CPU servers (Filelis-Papadopoulos et al. 2017). The trend of 
the generic models compared with the actual CPU measurements pro- 
vided by SPEC (SPEC 2008) for an HP Proliant DL560 Gen 9! is pre- 
sented in Fig. 5.5. 

Existing cloud simulators (i.e. GreenCloud and CloudSim) support the 
use of real application traces in order to compute the power consumption 
of the simulated applications in each time-step. This approach would neg- 
atively affect the scalability of the simulator in large-scale simulations, and 
for this reason, mean values of real application traces were computed and 
integrated. More specifically, the mean value of the CPU utilisation for 
each application is used to compute the mean power consumption of the 
application. Then, the energy consumption of the application is computed 
by multiplying the mean power consumption of the application with its 
execution time. This approach provides a lower computational cost, while 
the result of the energy consumption of the application is computed with 
approximately the same accuracy that would have been obtained if all the 
power traces were used. This methodology has been tested, achieving high 
levels of accuracy in the estimation of the energy consumption of applica- 
tions (Makaratzis et al. 2017). 
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Fig.5.5 Generic CPU power models compared to the power-utilisation diagram 
of an HP Proliant DL560 Gen 9 server 


Combined CPU-Accelerator Power Models 

A generic power consumption model was used for the estimation of the 
power consumption of accelerators such as GPUs, MICs, and DFE 
(Giannoutakis et al. 2017). This model was built around the idea that the 
maximum power consumption of an accelerator is consumed when an 
application is executed on the accelerator, while the idle power consump- 
tion is consumed when the application is executed only on the CPU. This 
binary model provides simplicity and increased accuracy (Makaratzis et al. 
2017). The model for the power consumption of hardware accelerators is 
described as follows: 


Prec (p) = (1 z p) Tea ia t DE, max 


where Pree min and Pac — max are the minimum and maximum power con- 
sumption values, respectively, that the application can consume on the 
accelerator, while p is the percentage of the application that is parallelised 
on the accelerator, thus in each time moment. Similarly, with the utilisa- 
tion parameters of the CPU power model, the mean value of parameter is 
computed based on real application traces, thus the mean value of the 
power that is consumed on the accelerator is computed for the total 
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execution time of the application. The combined CPU-accelerator mean 
power consumption of the application is computed as the sum of the mean 
power consumption of the CPU server and the mean power consumption 
of the accelerator. The energy consumption of an application that is exe- 
cuted on a heterogeneous node is computed by multiplying the combined 
CPU-accelerator mean power consumption with the execution time of the 
application. 

To conclude, in order to keep the computational cost low, generic CPU 
and accelerator power models were integrated in the CloudLightning 
Simulator. The simplicity of the models is of great importance since mod- 
els that are based on architectural details of the hardware resources require 
a substantial number of computations, considering the heterogeneity and 
the very large number of resources in the simulations. These models were 
validated on heterogeneous testbeds and a good accuracy level was 
achieved (Makaratzis et al. 2017). 


5.3.2.2 Memory, Storage, and Network Modelling 

Detailed modelling of memory would negatively affect the scalability of 
the simulator, especially in large-scale simulations, since it would require 
an increased amount of computations. Memory was implemented as a 
resource, measured in GBytes, that is used in the allocation of VMs to 
physical servers. Memory overcommitment was also implemented; thus, 
the total available memory was computed as the product of the total physi- 
cal memory and the overcommitment ratio. The power consumption of 
memory was included in the power consumption of the CPU servers, elim- 
inating the need for a separate memory power consumption calculation. 

The modelling of storage was also implemented with simplicity in order 
to keep the computational cost in low levels. The storage was implemented 
as a resource measured in TBytes. Global storage was not implemented, 
though its impact can be added directly to the time span of tasks. Detailed 
modelling of the power consumption of storage was not implemented 
since it would require substantially large number of computations, which 
would negatively affect the scalability of the simulator. The energy con- 
sumption of storage is considered to be included in the energy consump- 
tion of the CPU servers, similar to memory modelling. 

The network was implemented as a global component, visible from all 
the underlying resources, with the network bandwidth being shared 
among the arriving tasks of the system. When the requested network 
bandwidth exceeds the available capacity, the execution of applications is 
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affected negatively (in terms of the execution time). It should be noted 
that the network model of the CloudLightning Simulator was imple- 
mented through a catalogue of tasks, retaining all tasks executing at a 
given time-step. A linear model for computing the time required to trans- 
fer initial data and output data was implemented with a function of the 
following form: 


NT (t) = fileSize / bandwidth 


where fileSize is the size of the file to be transferred and bandwidth is the 
available physical bandwidth. 


5.3.2.3 Application Models 

In the design of the CloudLightning Simulator, the execution of VMs is 
part of a given task and their life cycle is directly connected to it. Each task 
is defined based on the following characteristics (Filelis- Papadopoulos 
et al. 2017): 


Type of application (Genomics, Oil and Gas, Ray Tracing) 
Available implementations (CPU-only, CPU+GPU, CPU+DFE, 
CPU+MIC) 

Number of instructions (in Millions of Instructions [MIs]) 
Required number of VMs 

Required number of processing units per VM 

Required memory per VM (in GBytes) 

Required storage per VM (in TBytes) 

Required accelerators per VM 

Required network bandwidth 


The minimum and maximum values are defined for the actual utilisa- 
tion of the CPU, the memory, and the network. The actual resources used 
by an application (utilisation) are computed based on application traces as 
a percentage of the requested resources over a number of predefined 
intervals. These utilisation parameters are considered as mean values with 
respect to the total execution time of the application. This approach main- 
tains the computational cost low, while the desired metrics are obtained 
with the same accuracy that would have been obtained if all the application 
traces were used. 


134 C.K. FILELIS-PAPADOPOULOS ET AL. 


All task parameters, including the number of instructions, the required 
number of VMs, and memory size, are randomly generated using a uni- 
form random number generator with respect to predefined intervals. The 
intervals are computed based on real application characteristics. 

This approach of application modelling reduces computational cost, 
allowing for large-scale simulations, while also providing realistic results 
during the simulations. 


5.3.2.4 Execution Models 

Existing cloud simulators generally create a priori task lists for the whole 
duration of the simulation, augment, and sort that list with respect to 
events triggered by inputs and so on. However, this has the disadvantage 
of simulation data storage, not only for the current event but also for 
future ones, restricting the execution of large-scale simulations over long 
time periods. In contrast, the CloudLightning Simulator is based on a 
time-advancing loop, where incoming tasks are created dynamically in 
each time-step and where each time-step is independent from any previ- 
ous or future ones (Filelis-Papadopoulos et al. 2017). A task list is then 
created at the beginning of each time-step, removing the need for data 
storage of future tasks of the simulation. Creating task lists per time-step 
reduces significantly the memory requirements of the simulation and 
offers the ability to simulate dynamical components that change their 
state according to dynamic strategies, including pRouters, pSwitches, 
and vRMs while allowing for the simulation over extended time 
periods. 

In the execution of tasks, the time-step is used as the control mecha- 
nism of the execution. The performance of applications is measured in 
MIs while the computational capability of the physical servers is mea- 
sured in Millions of Instructions per Second (MIPS). In each time-step, 
the number of instructions that can be executed by the available 
resources is subtracted from the total number of instructions of the 
application. This time-step-controlled execution model offers signifi- 
cant capabilities since the impact of various phenomena can be modelled 
by applying penalties on the execution of tasks. For example, phenom- 
ena such as performance degradation due to cache sharing or “noisy- 
neighbours” can be modelled by reducing the computational capability, 
meaning that fewer of the application’s instructions will be executed on 
the current time-step. Similarly, the usage of hardware with a higher 
computational capability, that is, accelerators, can be modelled by 
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increasing the computational capability of the current time-step. 
Service-level Agreement violations concerning memory, storage, or net- 
work limitations can be modelled by applying similar penalties in the 
execution of tasks. 

This approach of execution modelling allows the integration of possible 
extensions on the simulator, since any phenomenon can be modelled dur- 
ing a simulation by applying penalties or gains in the execution of the 
applications. Also, this execution model allows the simulation of very large 
time periods and millions of cloud servers, since the memory requirements 
of the execution model are very low. 


5.4 EXPERIMENTAL RESULTS 


This section presents the experimentation framework and the numerical 
results occurred after simulating the traditional cloud delivery system and 
the SOSM framework. 

The experiments were performed on a cluster consisting of four Dell 
PowerEdge C4130 nodes, each containing two 10-core Intel Xeon 
E5-2630 v4 CPUs running at 2.20 GHz (3.10 GHz Max Turbo fre- 
quency) with 128GB of Random Access Memory (RAM), and a Dell 
PowerEdge R730 node containing two 8-core Intel Xeon E5-2609 v4 
CPUs running at 1.70 GHz. During the simulation, the Dell PowerEdge 
R730 node was used to host the Gateway service, while the 4 Dell 
PowerEdge C4130 nodes were used to host the Cells. 

The time period of the simulation was set to one week (604,800 sec- 
onds), with a time-step of 1 second. The update interval of the Gateway 
Service was chosen to be 200 seconds, while the update interval of the 
pRouters, pSwitches, and vRMs was 20 seconds. The cloud nodes of the 
simulated data centre were selected to use an Intel Xeon E5-2699 v4 
2.20 GHz-based node with 44 cores and 385,063.42 MIPS, 128 GBytes 
of RAM, and 40 TBytes of storage. 

Each Cell consisted of four different types of hardware, that is, 
CPUs+GPUs, CPUs+MICs, CPUs+DFEs, or CPU servers with no accel- 
erators. Each heterogeneous node consisted of a CPU and four accelera- 
tors. The characteristics of the CPUs and the accelerators are presented in 
Table 5.1. It is noted that the linear interpolation method on uneven utili- 
sation intervals was used for the estimation of the power consumption of 
the CPU servers, where the power values for the various utilisation inter- 
vals were obtained? from SPEC (SPEC 2008). 
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During the simulations, three different types of applications were con- 
sidered. The characteristics of the applications are presented in Tables 5.2 
and 5.3. 

The CloudLightning Simulator was executed for different number of 
resources, Cells, and submitted tasks. Each Cell was hosted on a Dell 
PowerEdge C4130 node, while in the experiments with eight Cells, each 
computing node was hosting two Cells. Three different configurations 
were tested. In the first configuration, 11,000 resources per Cell were 
utilised, while the experiment was performed for different number of 
Cells. Similarly, in the second configuration, 110,000 resources per Cell 
were used, and in the third configuration, 1,100,000 resources per Cell 
were considered. The maximum number of submitted tasks was set equal 
to four per second when one Cell was used, while this number was multi- 
plied with the number of Cells when additional Cells were used. The VM 
allocation policy used was the “first-fit approach,” according to which 
tasks are placed on the first available server found. 


Table 5.2 Hardware characteristics 


Hardware MIPS Idle power consumption Max power consumption 
(Watts) (Watts) 

CPU 385,063.4268 44.9 269.0 

MIC 1,347,721.9938 30.0 350.0 

DFE 2,310,380.5608 70.0 100.0 

GPU 1,155,190.2804 50.0 400.0 


Table 5.3 Application characteristics 


Application type: 1 2 3 

Millions of instructions 1386.23-5544.91  462.08-2772.46 693.11-4158.69 
Number of VMs 1-16 1-8 14 

Number of vCPUs 4-8 8-16 4-8 

Memory (GBytes) 4-8 4-8 4-8 

Storage (TBytes) 0.02-0.04 0.01-0.02 0.04—0.08 
Network bandwidth (MBps) 2.5-5 0.5-1 2.5-5 

Network storage (GB) 0-0 0-0 0-0 
Implementations 1,2,3 1,2,3 1,4 

p 0,0.7, 0.5 0, 0.8, 0.9 0, 0.9 
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Table 5.4 presents the outputs, in terms of the number of accepted 
tasks, the average processor and accelerator utilisation, the average net- 
work utilisation, the energy consumption of the data centre, and the exe- 
cution time of the CloudLightning Simulator, simulating a traditional 
centralised cloud service delivery system. 

For all different configurations, it can be observed that the total num- 
ber of rejected tasks was high, with an ~86% task rejection on average. The 
task rejection was caused mainly by the network congestion appearing 
early in the simulated cloud (Fig. 5.6). Despite the fact that the selection 
of applications and their corresponding implementations (Table 5.3) were 
performed randomly using a uniform random generator, accelerator 
implementations were starting to be rejected after a period of simulated 
time, since the network resources are shared between the resources hosted 
across a Cell. This yields the acceptance of additional CPU tasks that in 
general require more computational time for execution and consequently 
overload the network. 

The energy consumption estimation of the cloud infrastructure 
increased with the number of resources per Cell and the number of Cells. 
It is expected that, except from the idle servers that consume the mini- 
mum power, when the utilisation of the cloud increases, the energy con- 
sumption will proportionally increase. 

The CloudLightning Simulator was also tested using the SOSM 
resource allocation framework, for 100 resources per VRM, 10 vRMs per 
pSwitch, and 5 pSwitches per pRouter. The VM allocation policy was 
“Task Compaction,” where the system is provisioning as many VMs as 
possible on each physical server. Table 5.5 presents the outputs of the 
CloudLightning Simulator, in terms of the number of accepted tasks, the 
average processor and accelerator utilisation, the average network utilisa- 
tion, the energy consumption of the data centre, and the execution time 
of the simulator, when using the SOSM engine. 

During the SOSM resource allocation simulation, it can be observed 
that there was a more balanced utilisation between CPUs and accelera- 
tors. More specifically, accelerators tended to be utilised at the same levels 
as CPUs, while in many cases, their utilisation percentages overcame the 
corresponding CPU ones. This was due to the fact that the system (SOSM 
framework) decides the resources (and types of implementations) to be 
allocated for a task, according to the predefined assessment functions, 
that targets on (a) improved service delivery, (b) computational efficiency, 
(c) improved energy consumption, and (d) efficient management of 
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underlying resources. Since accelerators are more efficient in terms of 
computational efficiency and energy consumption, the system’s choice is 
apparent. 

It can also be seen that the total number of rejected tasks was very low 
(~0.05%), but the total estimated energy consumption of the cloud was 
close to the estimations of the traditional delivery system, due to the utili- 
sation of the energy-efficient accelerators. Thus, the SOSM- based cloud 
environment was able to execute more tasks consuming almost equal 
energy. This was expected, since the SOSM selects the most efficient 
resources, executing the task faster, thus freeing those resources faster, and 
consequently leading to more tasks being accepted. 

In order to examine the energy efficiency of the two resource allocation 
techniques in more detail, the ratio of the total energy consumption of the 
data centre over the number of accepted tasks was computed for all experi- 
ments. In Table 5.6, the number of Wh that is consumed per task for all 
configurations is presented. It can be observed that the number of Wh per 
task is substantially smaller when the SOSM engine is used. This is due to 
the fact that when the SOSM engine is not used, the resources that are 
utilised are selected randomly, while with the SOSM engine the resources 


Table 5.6 Ratio of the total energy consumption of the cloud over the number 
of accepted tasks for all configurations 


Configuration Cells Wp per task Wh per task 
without SOSM with SOSM 


3030.26314 521.38713 
1633.66523 539.09557 
1268.57330 449.99035 
1044.89027 566.98702 
1333.53448 553.39248 
15,744.33657 1167.88194 
10,735.85269 1195.96974 
6239.39226 1118.47856 
4842.13695 1233.55551 
6982.59563 1195.58945 
142,955.84119 8326.39716 
93,556.77024 8345.17402 
53,754.62934 8276.59419 
39,736.79489 8371.74376 
60,682.07036 8337.67258 


Configuration 1: 11,000 resources per Cell 


Configuration 2: 110,000 resources per Cell 


Configuration 3: 1,100,000 resources per Cell 
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are selected by the system, according to the predefined strategies; thus, the 
most energy efficient solution is always chosen. 

In Figs. 5.6 and 5.7, time-dependent charts are presented for the last 
experiment of the third configuration (eight Cells, 1,100,000 servers per 
Cell). In Fig. 5.6, the energy consumption, the processor utilisation, the 
accelerator utilisation, and the network utilisation of the cloud are pre- 
sented with respect to the simulated time for the traditional centralised 
cloud service delivery. In Fig. 5.7, the energy consumption, the processor 
utilisation, the accelerator utilisation, and the network utilisation of the 
cloud are presented through the simulation time when using the SOSM 
engine. 


5.5 CONCLUSION 


This chapter presented the work towards demonstrating the scalability of 
the CloudLightning simulation framework. Cloud simulation tools are 
examined, since demonstrating scalability in hyperscale clouds is unfeasi- 
ble. The design and implementation of the CloudLightning simulation 
framework were presented, a framework that overcomes the limitations of 
the existing simulation platforms. The main innovations of the framework 
lie in the fact that it is implemented for parallel computing systems (using 
MPI and OpenMP), it is based on a time-advancing loop instead of a dis- 
crete sequence of events, it allows the integration of dynamic resource 
allocation systems such as SOSM, and it supports hybrid CPU-accelerator 
resources. Finally, the CloudLightning Simulator was developed to be eas- 
ily extensible, since the time-advancing execution model allows the inte- 
gration of any strategies or phenomena observed in cloud environments. 
From the experiments that were performed, the CloudLightning simu- 
lator was found to be capable of simulating clouds with large number of 
resources. Different executions were performed with the traditional cloud 
delivery system and with the use of the SOSM framework, for a various 
number of resources and Cells. Both the simulation platform and the 
SOSM framework were found to be scalable; simulations up to 8,800,000 
hardware resources grouped into eight Cells were performed, only limited 
by the available hardware used for experimentation. SOSM was found to 
provide a more balanced distribution of tasks on the available hardware 
resources, with a much lower number of total rejected tasks. The energy 
consumption was found to be equivalent to the energy consumed when 
simulating a traditional cloud delivery system; however, the SOSM system 
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was able to service a significantly larger number of tasks. Thus, the energy 
consumed per task in the SOSM system was substantially reduced com- 
pared to the traditional approach. 

The CloudLightning Simulator and Simulator Visualization Tool are 
available for download under the Apache 2 open source licence at https: // 
bitbucket.org/cloudlightning/cloudlightning-simulator and https://bit- 
bucket.org/cloudlightning /cl-simulatorvisualization, respectively. 
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Clayton Christensen, in his seminal study on the disk drive industry, iden- 
tified two types of technological change. Sustaining technologies sustained 
the industry’s rate of improvement in product performance and ranged in 
difficulty from incremental to radical, whereas so-called disruptive innova- 
tions redefined performance trajectories and consistently resulted in the 
failure of the industry’s leading firms (Christensen 1997). Cloud comput- 
ing continues to transform, and democratise access to, the use of informa- 
tion and communications technology infrastructure. Organisations of all 
sizes and sectors, as well as the general public, are able to exploit the 
advantages of the agility and scalability (up and down) inherent in cloud 
computing to work more efficiently, reduce Information Technology (IT) 
costs (including IT capital expenditure, maintenance and support costs, 
and related environmental costs), support resilience and business continu- 
ity, and growth (Hogan et al. 2011; Low et al. 2011; Buyya et al. 2009; 
Leimbach et al. 2014). This book is about disruptive potential—the (i) the 
potential of cloud computing to disrupt the high performance computing 
(HPC) sector and (ii) the potential of a new heterogeneous cloud archi- 
tecture based on the concepts of self-organisation, self-management, and 
the separation of concerns to disrupt extant cloud resource management 
approaches. 

For a significant portion of the last half-century, HPC exploited rela- 
tively established trajectories of performance; single-thread processor 
clock frequency was viewed as the main driving factor behind increasing 
computational performance. Manufacturers of such processors, and Intel 
in particular, delivered consistent improvements in performance until hit- 
ting a scientific “power wall” for single-core processors at the turn of the 
century. With the levelling off of single-thread processor performance, the 
industry sought to sustain performance trajectories by combining multiple 
Central Processing Unit (CPU) cores on one chip to achieve performance 
gains. While multi-core architectures achieve performance gains, efficient 
parallel computation on multiple cores provided discrete challenges for 
the HPC end user community. More recently, the use of different types of 
processor has been exploited to address this issue. As different compute 
resources can have different properties, applications with diverse charac- 
teristics can be executed quicker and more efficiently using these 
processors. 

Heterogeneous architectures support these specialist processors as co- 
processors to a host processor; the host processor can complete one 
instruction stream, while the co-processor can complete a different 
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instruction stream or different type of stream (Eijkhout et al. 2016). While 
such heterogeneous resources can provide new measures of performance, 
for example, energy efficiency, both technically and culturally the HPC 
community remains focused on maximising the (effective) processing 
speed of a given architecture to orders of magnitude greater than general- 
purpose computing. Whereas each evolution of the processor architecture 
was relatively novel in the context of difficult HPC applications, it was not 
disruptive. To paraphrase Christensen (1997), the customers of the lead- 
ing HPC supplier led them towards these achievements. These sustaining 
technologies did not precipitate failure by incumbents or significant 
changes in the HPC industry structure. Size still matters. The HPC com- 
munity remains dominated by a relatively small number of suppliers cater- 
ing for a relatively small number of large organisations requiring significant 
investments in infrastructure. For the most part, access to HPC remains 
restricted by architectural complexity, availability of trained personnel, and 
budgetary issues (Intersect360 Research 2014). 

In the last few years, cloud service providers (CSPs) have sought to 
enter the HPC market; however, HPC has remained one of the smallest 
segments in the market. This can be explained by both technical and cul- 
tural perceptions on the nature of HPC and the efficacy of cloud comput- 
ing architectures to deliver high performance. From a technical perspective, 
many HPC workloads are not ready to run on today’s cloud architectures, 
and provisioning of HPC clusters in the cloud still typically requires deep 
IT knowledge. Similarly, many in the HPC community do not believe a 
general-purpose distributed architecture designed for multi-tenancy, hori- 
zontal scaling, and minimal interference with physical infrastructure can 
deliver the performance expectations for HPC. And this may be correct. 

However, there are classes of HPC users who do not need maximum 
performance, and this goes to the core of the disruptive potential of cloud 
computing for HPC. Cloud computing creates new markets and value 
networks for organisations (and individuals) who cannot afford or cannot 
gain convenient access to traditional HPC infrastructure such as super- 
computers, who have loosely coupled workloads that can be scaled hori- 
zontally, and/or have pent-up HPC demand and find it difficult to burst 
capacity for overflow or surge workloads with their existing HPC infra- 
structure. Given the impact HPC has on scientific discovery and innova- 
tion, dramatically increasing access and use of HPC through the cloud to 
this wider community of low-end consumers or non-consumers has the 
potential to drive significant societal and economic impact. 
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At the same time, it is questionable whether the economic model of 
conventional hyperscale cloud computing is sustainable in the long term. 
Not from a business or technology perspective but from an environmental 
perspective. The IT sector accounts for a significant portion of global elec- 
tricity with some estimates at approximately 7% (Corcoran and Andrae 
2013). Data centres have an extremely energy-intensive profile. For exam- 
ple, a study conducted for the US Department of Energy estimates that 
data centres consume 10-50 times the energy per floor space of a typical 
commercial office building and collectively (Darrow and Hedman 2009). 
In 2014, data centres accounted for 1.8% of total US electricity consump- 
tion driven by increased Internet usage and the rise of cloud computing 
(Shehabi et al. 2016). Research suggests that the data centre sector, and 
hyperscale data operators specifically, has taken significant measures to 
improve energy efficiency including increasing server productivity and 
utilisation and efficiency improvements in storage, network, and data cen- 
tre infrastructure operations such as cooling (Shehabi et al. 2016). Despite 
these initiatives, the environmental impact of Information and 
Communications Technologies (ICT) operations, data centres, and cloud 
energy usage remains a significant concern and increased focus of policy 
makers and civic society. 

Research suggests that existing measures for greater data centre energy 
efficiencies will reach theoretical and practical limits in the near future, and 
therefore cloud computing especially needs to look beyond its current 
model of using one-size-fits-all hardware towards optimising hardware for 
specific workloads (Shehabi et al. 2016). Such optimisation is central to 
the heterogeneous cloud; however, such a vision for cloud computing 
increases the complexity of managing cloud infrastructure dramatically. As 
such, a new paradigm for cloud computing architectural design is required. 

This book presents one possible architectural design, CloudLightning, 
for managing heterogeneous clouds based on selforganisation, self- 
management, and the separation of concerns. CloudLightning is a funda- 
mentally different architecture to the homogeneous cloud platforms 
prevalent today. Specifically, it both accommodates workload variation 
through optimised heterogeneous hardware and hides this complexity 
from enterprise application developers and end users, thus providing a dif- 
ferent package of attributes including not only hardware performance but 
energy efficiency, ease of management, and ease of use as well. 
CloudLightning’s disruptive potential is the new performance trajectory 
that such attributes create. 
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50, 57, 58, 88, 96, 97, 99, 107, 
110, 115, 123, 147, 152, 153 
applications, three categories of, 8 
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barriers to wider adoption, 9 
as cornerstone of scientific and 
technical computing, 8 
High throughput computing (HTC), 
xvi, 10 
definition, 10 
Holvoet, T. 
definition of ‘self-organisation’, 15 
Homogeneity, 6, 9, 11, 129 
Horizontal scaling, 10, 153 
definition, 10 
HOT, see Heat Orchestration 
Template 
HPC, see High performance 
computing 
HTC, see High throughput computing 
Human blockage modelling, as an 
example of ray tracing, 21 
Hyperscale cloud providers, 4 
list of, 4 


I 
laaS, see Infrastructure as a Service 
IBM, see International Business 
Machines 
iCanCloud, 121, 122 
IDC, see International Data 
Corporation 
Image rendering, as an example of ray 
tracing, 21 
Increase in order, 15 
definition of (see Self-organising 
systems, essential characteristics 
of) 
Information Technology (IT), 
efficiencies, 4 
Infrastructure as a Service (IaaS), xvi, 
4, 19, 39, 40, 55, 59, 91, 93 
Intel, 12, 13, 21, 40, 135, 152 
Interacting parts, 14 
definition of (see Emergent systems) 
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International Business Machines 
(IBM), 4, 16 
as a hyperscale cloud provider, 4 
International Data Corporation 
(IDO), xvi, 8, 9, 18 
European Commissioned-report, 
2015, 8 
Intersect360, 7, 8 
research on HPC market, 7 
IT, see Information Technology 


K 

Kubernetes, 39 
description of, 39 
Kubernetes Master, 39 


L 
Laser ablation profile modelling, as an 
example of ray tracing, 21 
Least Disruptive algorithm, example 
of, 83 
LED illumination systems modelling, 
as an example of ray tracing, 21 
LINPACK, 7 
benchmarking the Sunway 
TaihuLight, 7 
Loosely coupled, 8 
as a category of HPC applications, 8 
Loosely-coupled applications, 
examples of, 8 


M 
Many integrated cores (MICs), xvi, 
11, 12, 21, 41, 45, 50, 99, 103, 
133 
example of co-processor 
architectures, 11 
examples of heterogeneous 
computing, 11 


MAPE-K, see 
Monitor-Analyse-Plan-Execute- 
Knowledge 

Message Passing Interface, use in the 
CloudLightning simulator, 123 

Message passing (MP) system, 7 

Micro-macro effect, 14 

definition of (see Emergent systems) 
Microsoft, 4, 6, 7, 32, 123 
as a hyperscale cloud provider, 4 

MICs, see Many integrated cores 

Monitor-Analyse-Plan-Execute- 
Knowledge (MAPE-K), xvi, 16, 
18 

Monolithic, resource scheduling 
scheme, 42 

Monolithic scheduler, 42 

description of (see Monolothic) 
Monte Carlo simulations, 8 
as an example of a loosely-coupled 
application, 8 

MP system, see Message passing system 

Multi-tenancy, 153 

Multi-tenant, see Multi-tenancy 


N 

National Institute of Standards and 
Technology (NIST), xvi, 2, 37 

National Supercomputing Centre, 
WuXi, 7 

NIST, see National Institute of 
Standards and Technology 

Non-uniform Memory Access 
(NUMA), xvi, 7, 45 

NUMA, see Non-uniform Memory 
Access 


O 
OASIS CAMP, 99, 107 
construction of the CL-SDL, 99 


OpenMP, 123, 126, 129, 145 
Open Porous Media (OPM), xvi, 20 
OpenStack, 19, 37-39, 91, 92, 97, 105 
definition of, 37 
OpenStack Heat, 91, 92 
as an example of an laaS resource 
management framework, 19 
OpenStack Solum, 91 
as an example of an application 
lifecycle framework, 19 
OPM, see Open Porous Media 
Organic Computing project, 17 
Organisations, definition of, 15 
Over-provisioning, 6, 11, 32 
as assurance of service availability, 6 


P 
PaaS, see Platform as a Service 
Packet-Level Simulators (PLS), xvi, 
120 
Parallelisation, convergence of IT 
efficiency and business agility, 4 
Parallel processing, 7, 8 
achieved by grid computing or ‘scale 
out, 7 
Perfectly parallel problems, 10 
Performance per watt, 9 
Platform as a Service (PaaS), xvi, 4, 39 
Platform homogeneity, 6 
as a characteristic of WSC hardware 
and system software (see 
Homogeneity) 
Pleasingly parallel problems, 10 
PLS, see Packet-Level Simulators 
Prescription Router (pRouter), xvi, 
66-74, 76, 78-80, 82, 127, 129, 
134, 135, 138 
definition of, 67 
Prescription Switch (pSwitch), 67, 69, 
70, 72, 73, 76, 78, 80, 82, 
127-129, 134, 135, 138 
definition of, 67 
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Project Solum, see OpenStack Solum 

pRouter, see Prescription Router 

pSwitch, see Prescription Switch 

Puppet, as an example of a 
Configuration Management 
System, 100 


Q 
Quality of service (QoS), xvi, 7, 24, 
90, 94, 97 
impact of heterogeneity, 7 


R 
Rackspace, 32 
as an example of a cloud service 
provider, 32 
Radical novelty, 14 
definition of (see Emergent systems) 
Rapid deployment, convergence of IT 
efficiency and business agility, 4 
Ray tracing, 20, 21 
as an embarrassingly parallelisable 
algorithm, 20 
variety of industry applications, 21 
Real Time Migration (RTM), xvi, 20 
description of, 20 
Resource discovery, process of, 
103-105 
Resource lifecycle management, 99 
components, features, functionality 
of, 99 
Resource management, vi, 18, 23, 32, 
38, 40-43, 60, 90, 91, 94, 96, 
100, 152 
as a feature of traditional cloud 
infrastructure, 40 
Resource release, process of, 
105-106 
Resource scheduling, three schemes 
of, 42 
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Robustness and flexibility, 14 
definition of (see Emergent systems) 
RTM, see Real Time Migration 


S 
SaaS, see Software as a Service 
Salesforce, 4 
Salesforce.com, see Salesforce 
Scalability, vi, 9, 10, 32, 33, 35, 36, 
42, 56, 93, 121, 122, 129, 130, 
132, 145, 152 
convergence of IT efficiency and 
business agility, 4 
primary focus of cloud computing, 
10 
Scale-out, 6, 7, 11 
in Warehouse Scale Computing, 6 
Scale-out strategies 
advantages of, 11 
See also Scale-out 
SDE, see Service Decomposition 
Engine 
SDLs, see Service description languages 
Seismic processing, 8 
as an example of a data-intensive 
application, 8 
Self-*, 16, 19 
Self-configuration, 16 
Self-healing, 16 
Self-management, vi, 2, 11, 13, 
16-19, 23, 32, 43, 64, 65, 70, 
79, 80, 85, 88, 99, 147, 152, 154 
four aspects of, 16 
Self-optimisation, 16 
Self-organisation, vi, 2, 11, 13, 15-19, 
23, 32, 43, 47, 49-51, 60, 64, 
65, 69-79, 82-88, 99, 100, 152, 
154 
definition of, 15 
occurrence at micro-level, 15 
roots of, 13 


Self-Organisation Agent, 49 
Self-organisation and self-management 
(SOSM), xvii, 64, 79-85, 96, 97, 
99, 100, 102-106, 110-112, 
115, 124, 126-135, 138, 
142-146 
Self-Organising Agent, see Self- 
Organisation Agent 
Self-organising systems, 15 
design of, 16 
essential characteristics of, 15 
similar to emergent systems, 15 
Self-protection, 16 
Separation of concerns, vi, 2, 11, 13, 
18, 19, 23, 32, 90, 93-99, 152, 
154 
between applicaton lifecycle and 
resource management, 90 
capabilities of the SDL, 97 
definition of, 18 
functional components of, 97 
Sequencing, see Genome sequence 
processing; Genome sequencing 
Server-Centric model, description, as 
a data centre design strategy, 
35-36 
Service decomposition, see Service 
Decomposition Engine 
Service Decomposition Engine (SDE), 
xvi, 99, 100, 102-105, 107, 109, 
112, 115 
operation of, 102 
relevant attributes of, 109 
role and summary of operation, 102 
Service description languages (SDLs), 
v, xv, xvi, 19, 23, 24, 91, 94, 99, 
115 
Service Elements, 55 
Service-oriented architecture, 18, 90 
Shared-State, resource scheduling 
scheme, 42 
SI, see Suitability Index 


Software as a Service (Saas), xvi, 4, 39, 
40 

Solar concentrator modelling, as an 
example of ray tracing, 21 

SOSM, see Self-organisation and 
self-management 

SprintNet, 36 

Static Coalitions, see Coalitions 

Suitability Index (SI), xx, 65, 70, 
78-83, 85, 128 

Sunway TaihuLight, 7 

Supercomputers, 7-10, 12, 153 

Switch-Centric model, description, as 
a data centre design strategy, 35 

System, definition of, 15 


T 
3D-animation rendering, as an 
example of a data-intensive 
application, 8 
3D image rendering, 8 
as an example of a loosely-coupled 
application, 8 
3-tier data centre design, description 
of, 34 
Tightly coupled, 8 
as a category of HPC applications, 8 
Tightly-coupled applications, examples 
of, 8 
Turing, A., 14 
global order arises from local 
interactions, 14 
Two-Level Scheduling, resource 
scheduling scheme, 42 
Two-way link, 14 
definition of (see Emergent systems) 


U 
Ultrasonic imaging, as an example of 
ray tracing, 21 
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Vv 

Vertical scaling, 10 
for performance improvement, 10 

Virtualisation, 10, 13, 37, 38, 43, 83, 

85 

vRack Managers, 45, 47-54, 66, 67 
functional components of, 47 
function of, 47 
Type-A vRack Managers, 51 
Type-B vRack Managers, 51 
Type-C vRack Managers, 50 


w 
Warehouse Scale Computers (WSCs), 4 
definition, 4 
hardware and software 
characteristics of, 4 
Weather and climate modelling, 8 
as an example of scientific and 
technical computing 
See also Weather and climate 
simulations 
Weather and climate simulations, 
example of a tightly-coupled 
application, 8 
WSCs, see Warehouse Scale Computers 


x 
Xeon Phi, 13, 40 
Xeon Phi processors, see Xeon Phi 


Y 
Yahoo, 123 
Warehouse Scale Computer 
architecture, 123 


Z 
Zync Render, 7 


